Measuring the Fitness-for-Purpose of Requirements: An initial Model of Activities and Attributes (2024)

1^st Julian Frattini,3^rd Davide Fucci,
4^th Michael Unterkalmsteiner,5^th Daniel MendezBlekinge Institute of Technology
Karlskrona, Sweden
{firstname}.{lastname}@bth.se 2^nd Jannik FischbachNetlight Consulting GmbH and *fortiss GmbH
Munich, Germany
jannik.fischbach@netlight.com

Abstract

Requirements engineering aims to fulfill a purpose, i.e., inform subsequent software development activities about stakeholders’ needs and constraints that must be met by the system under development.The quality of requirements artifacts and processes is determined by how fit for this purpose they are, i.e., how they impact activities affected by them.However, research on requirements quality lacks a comprehensive overview of these activities and how to measure them.In this paper, we specify the research endeavor addressing this gap and propose an initial model of requirements-affected activities and their attributes.We construct a model from three distinct data sources, including both literature and empirical data.The results yield an initial model containing 24 activities and 16 attributes quantifying these activities.Our long-term goal is to develop evidence-based decision support on how to optimize the fitness for purpose of the RE phase to best support the subsequent, affected software development process.We do so by measuring the effect that requirements artifacts and processes have on the attributes of these activities.With the contribution at hand, we invite the research community to critically discuss our research roadmap and support the further evolution of the model.

Index Terms:

requirements engineering, requirements quality, literature review, interview study, activity

I Introduction

Requirements engineering (RE) is a means to an end and aims to fulfill a purpose, i.e., to inform subsequent activities of the software development life cycle about the needs and constraints of relevant stakeholders[1].Therefore, requirements artifacts and processes must be fit for purpose.This fitness for purpose is determined by the attributes of the software development activities that are affected by requirements artifacts or processes[2].For example, a requirements specification is considered fit for purpose when implementing (activity) its implied features works correctly, completely, and quickly (attributes), among other attributes.In that sense, we should judge the quality of requirements (and RE) based on the extent to which they are fit for purpose, i.e., how they impact the attributes of requirements-affected activities[3].Still, research on requirements quality is dominated by studies aiming to determine the quality of a requirements specification solely based on normative metrics[4].

Recent endeavors to nuance requirements quality research with this activity-based perspective are promising[3, 1], but have so far not seen adoption in practice[2].One reason for this is the lack of an overview of software development activities that are affected by requirements engineering as well as their measurable attributes.This gap was acknowledged in previous requirements quality research[5, 3] and is one milestone on requirements quality research roadmaps[1, 2].The overview of the activities that are potentially affected by RE would offer guidance on which activities determine the fitness for purpose of RE processes and artifacts.Furthermore, an overview of the activities’ attributes would offer guidance on how to measure their performance.Consequently, we formulate the following research questions:

•
RQ1: Which software development activities are affected by requirements artifacts?
•
RQ2: By which attributes are requirements-affected activities evaluated?

This paper initializes the endeavor to create and maintain an overview of requirements-affected activities and attributes answering the research questions.As the first step, we inductively construct an initial model from three distinct data sources (SectionIII).The model contains 24 activities like implementing, testing, and estimating effort, and characterizes them with 16 attributes including duration, completeness, and correctness (SectionIV).The paper further describes how to apply the model in research and practice and how future research will advance the endeavor (SectionV).We disclose all material, data, and source code¹¹1Archived at https://zenodo.org/doi/10.5281/zenodo.10869626 to facilitate this community endeavor.

II Background and Related Work

II-A Requirements Use in SE

We consider as an activity any SE-relevant process performed by a (human or software) agent that uses one or more input artifacts and produces one or more output artifacts[3].Figure1 visualizes a simplified overview of SE activities, the artifacts they use as an input and produce as an output, and their scope.For example, the implementing activity receives several input artifacts like a requirements specification and system architecture to produce output artifacts like source code.

Measuring the Fitness-for-Purpose of Requirements: An initial Model of Activities and Attributes (1)

We consider an activity requirements-affected if at least one of its input artifacts is a requirements artifact (yellow activities in Figure1).The aforementioned implementing activity is requirements-affected because it considers a requirements specification as an input.In the simplified example in Figure1, the requirements elicitation and the deployment activity are not requirements-affected.It is, however, possible that the requirements elicitation activity may be affected by requirements artifacts of previous projects and sprints or that explicit deployment requirements exist.

II-B Requirements Quality

Since requirements artifacts are used as input to requirements-affected activities, the artifacts’ quality affects the quality of these activities and their output[6].For example, a vague requirements specification may lead to incorrect or missing features and reduced customer acceptance[7].These quality defects are more expensive to fix the later they are addressed[8]:Revising a vague requirements specification is less expensive than redeveloping a faulty system built on it.Therefore, organizations aim to detect and remove requirements quality defects as early as possible[9].

However, requirements quality research focuses predominantly on normative quality factors[4] that do not consider an impact on affected activities[9, 2].For example, the use of passive voice is often advised against in literature[10, 11, 12] despite a lack of empirical evidence for its negative consequences[13, 14, 15].This fosters skepticism of organizations to adopt requirements quality research[16, 17].

To address this issue, Femmer et al. proposed the perspective of activity-based requirements quality[3].This perspective entails that requirements are only as good as they support the activities in which they are used[1], i.e., requirements quality depends on the performance of requirements-affected activities.Specifying requirements quality as fitness-for-purpose to support affected activities necessitates requirements quality research to understand requirements-affected activities, i.e., it requires identifying and measuring activities affected by a requirements artifact[2].

Without a systematic elicitation of requirements-affected activities prior to investigating the quality of a requirements artifact, researchers risk drawing incomplete conclusions.For example, Ricca et al. investigate the effect of screen mock-ups on requirements comprehension[18] and conclude that providing screen mock-ups improves the understandability of requirements.Femmer et al. confirm this effect but contrast that they simultaneously have a negative effect on requirements maintainability[3].Systematic studies on activity-based requirements quality agree that an overview of requirements-affected activities and their attributes is necessary to advance relevant requirements quality research[3, 1, 2].

II-C Related work

Requirements engineering literature contains several studies about the impact of requirements quality on subsequent software development activities.For example, Kamata et al.[19] and Zowghi et al.[20] empirically investigated the impact of requirements quality on project success measured in time and cost overrun.Similarly, Knauss et al. studied the impact of requirements quality on project success measured by customer satisfaction[21].These studies generalize the affected activities and summarize their effect on the overall project outcome.

Studies focusing on more specific activities include Chari et al. investigating the impact of requirements defects on injected software defects[22], and Femmer et al. relating the use of passive voice to the domain modeling activity[15].On the other hand, some studies expand the scope of affected activities.Damian et al. conducted a longitudinal case study observing a full project development lifespan and measured the tradeoffs of a revised RE process on several activities like communication, effort estimations, and implementation[23].Mendez et al. conducted a large-scale, global survey of perceived problems in RE and their effects on activities, including designing, implementing, and organizing[7].

Research on traceability between software development artifacts constitutes another closely related domain.Several secondary studies have summarized traceability research and identified artifacts that are commonly connected[24, 25].Although requirements artifacts are prominent targets of trace links, they are typically connected to other artifact types, not the activities that produce them[24].These artifact types can be used to infer the producing activities, though the inferred activities typically remain on a very high level[25].Furthermore, this limitation excludes by design all activities that do not necessarily or only rarely produce artifacts, like, for example, informal reviewing, modifying existing artifacts, assessing feasibility, or estimating effort.

In summary, none of these previously mentioned primary studies systematize the affected activities and their attributes but rather select the studied impact based on the availability of data or anecdotal hypotheses, and traceability research exhibits significant limitations regarding the identification of these activities.Only two studies known to the authors attempt to explicate the affected activities.Femmer et al. elicited the activities affected by specific requirements artifacts at a case company and determined the qualitative impact of requirements defects on them[3].In a similar study, Frattini investigated requirements quality factors relevant to a case company and their impact on subsequent activities[5].Both studies prototype a model of requirements-affected activities for the specific context but acknowledge the need for a more systematic and comprehensive overview.

III Goal and Early Method

One goal of activity-based requirements quality research is to create and maintain a comprehensive model of requirements-affected activities and their attributes exhibiting the following properties[1, 2]:

1.
Applicability: The model can represent all requirements-affected activities and attributes in any given SE context.
2.
Suitability: The model can be used to evaluate relevant activities by means of their attributes.
3.
Extensibility: The model can be extended with new activities or attributes.
4.
Usability: The model can be accessed and comprehended by software engineers.

In this study, we contribute the first version of this model.Since we are not aware of any systematic prior work collecting requirements-affected activities and their attributes[1, 2], we surveyed different data sources for textual descriptions of SE activities that use requirements artifacts as input.From these textual mentions, we inductively construct a model of requirements-affected activities and their attributes by employing thematic synthesis as proposed by Cruzes and Dybå[26].

III-A Data Collection

To ensure the property of applicability as mentioned above, we collected data from three distinct sources described in the following three subsections:a systematic review of experimentation literature (SectionIII-A1), an interview study (SectionIII-A2), and a literature study on software process models (SectionIII-A3).

III-A1 Systematic Literature Review

The first source of textual descriptions of requirements-affected activities and their attributes that we considered were scientific studies reporting controlled experiments in which the experimental task involves human subjects and considers requirements as an input artifact.These experimental tasks simulate requirements-affected SE activities performed by practitioners.The dependent variables in these experiments are eligible attributes describing the performance of the activity.We adopted the systematic literature survey method employed by Sjøberg et al.[27].

Database selection.To ensure that our database search for eligible primary studies targets publications relevant to SE we pre-selected eligible journals and conferences (from hereon out collectively called venues) from the CORE ranking²²2https://www.core.edu.au/ whose field of research is software engineering.To ensure a high quality of the primary studies, we constrained the venues to those of rank A* or A.A few select venues of lower rank that are particularly relevant to the topic constituted an exception.These included the Requirements Engineering Journal, the Journal of Software: Evolution and Process, the International Working Conference on Requirements Engineering: Foundation for Software Quality, the International Conference on Product-Focused Software Process Improvement, and the Euromicro Conference on Software Engineering and Advanced Applications, which all have a core rank of B.Additionally, we removed all venues that host computer science rather than SE topics.This task was performed by three authors in conjunction to ensure reliability.The final database selection contained 35 venues (10 journals and 25 conferences).

Database search.We performed a keyword-based database search for each included venue with the keywords experiment* as well as requirement* (or the synonyms srs or specification*).These keywords limited the retrieved primary studies to those (1) describing an experiment and (2) involving requirements at least to some degree.We executed the database search via Scopus³³3https://www.scopus.com/search/form.uri?display=advanced and in four cases, where Scopus did not index publications of that venue, via the ACM Digital Library.⁴⁴4https://dl.acm.org/The search string per venue consisted of the two sets of keywords as well as a limitation to the venue via its title.For example, the search string for the ACM Computing Surveys journal in Scopus looked as follows: SRCTITLE ( computing AND surveys ) AND TITLE-ABS-KEY ( requirement* OR srs OR specification* ) AND TITLE-ABS-KEY ( experiment* ).The search per venue returned between 1 (e.g., from the European Conference on Object-Oriented Programming) and 175 (from the Journal of Systems and Software) primary studies for a total of 1446 studies.

Inclusion.Next, we performed an inclusion phase to ensure the following properties of primary studies expressed by the two inclusion (I1 and I2) and four exclusion criteria (E1-E4):

•
I1: The primary study presents an experiment with human subjects as one of its core contributions.
•
I2: The experimental task uses a requirements specification as an input.
•
E1: The experimental task is a requirements review.
•
E2: The study is not written in English.
•
E3: The publication is not available via the university’s access program.
•
E4: The study is a duplicate of or extended by an already included study.

I1 ensures that eligible primary studies present a proper experiment (regardless of whether it is controlled or quasi) that involves human subjects.Otherwise, the experimental task would not simulate an SE activity, the concept of interest.This excludes, for example, experiments in which machine learning algorithms of different configurations are compared on a classification task.I2 ensures that the activity is requirements-affected.E1 explicitly excludes requirements review tasks, i.e., requirements defect detection and removal activities.The purpose of identifying requirements-affected activities is to optimize the affecting requirements in a way that improves their impact on the activities.This optimization process is the requirements review.Hence, we excluded these studies to avoid a circular impact, i.e., suggesting to optimize requirements for the reviewing activity, which is exactly this optimization.E2 and E3 exclude inaccessible studies, and E4 removes content duplicates.Primary studies were considered for the next data analysis step when they met all two inclusion and none of the exclusion criteria.The first author conducted the inclusion step based on the title, abstract, and keywords.Out of 1446 primary studies, 145 (10.3%) were included.To ensure the reliability of this subjective process, the second author independently performed the inclusion task on 75 (i.e., 5.2%) randomly selected studies.We calculate the inter-rater agreement using Bennett’s S-Score[28], which is robust against uneven marginal distributions[29].The inter-rater agreement yields a value of 92%, which we deem sufficient to instill confidence in this subjective task.

Data Extraction.The first author reviewed all 145 included primary studies and extracted, for each human-subject experiment in each study, (1) the description of the experimental task and (2) all dependent variables measured to evaluate the performance of the task.The description of the experimental task constituted the source of requirements-affected activities, and the dependent variables were the source of their attributes.While reviewing the full text of the studies, 22 studies revealed to not, in fact, meet all inclusion criteria other than the title, abstract, and keyword had suggested.We excluded these 22 studies from further processing.

Additionally, we excluded extractions where the attribute description did not imply a valuation.Because our goal was to identify attributes that quantify the performance of their respective activity, eligible attributes must be valuating—i.e., values of that attribute must imply a degree of performance.While attributes do not necessarily have to be measured on an interval scale (i.e., it is not important to associate an interval of the attribute, like a certain amount of minutes for the attribute duration, with a specific level of quality), it has to be at least on an ordinal scale—i.e., the sign of the interval is important (more minutes of duration is bad, less minutes of duration is good).For example, if the dependent variable of an experiment investigating the activity of estimating effort is the estimated amount of hours[30], then this data point(i.e., pair of activity and attribute) was excluded as a higher or lower value of that attribute does not automatically make it good or bad due to the lack of ground truth.If, instead, the dependent variable was precision, i.e., how close the estimated amount of hours is to actual implementation time, then the data point would be included as a higher value of precision (i.e., an estimation that is closer to the actual time) is better.This process eliminated 12 descriptions of non-valuating attributes.To assess the validity of this process, the third author independently repeated the task on a sample of 12 data points, which consisted of 6 random samples from each of the two classes (valuation vs. no valuation), and we measured the inter-rater agreement using Cohen’s Kappa[31] since the classes have an even marginal distribution[29].The first overlap achieved a Cohen’s Kappa of only 33.3%, which emphasized the complexity of the task.The two authors reconvened, discussed the differences, reformulated the exclusion criteria, and repeated the labeling.The second overlap achieved a score of 83.3%, which represents a sufficient reliability of the step.

The extraction produced 142 descriptions of experimental tasks and 355 descriptions of dependent variables.Several experimental tasks were evaluated via multiple dependent variables, which is why the 355 resulting data points contain repeated descriptions of experimental tasks.

III-A2 Interview Study

The second source of textual descriptions of requirements-affected activities and their attributes that we consider were reports from industry practitioners about the usage of requirements specifications in subsequent SE activities.To this end, we evaluated the transcripts of a previously conducted interview study[5].

Interview Participants.The first author conducted the interview study in a large, globally distributed software development organization that specifies requirements using both free-form and constrained natural language (use cases) prior to each development cycle.A contact at the organization provided a sample of eight software engineers directly responsible for processing requirements specifications and developing solution specifications based on them.These eight engineers represent the majority of personnel in their role in the team that was involved in the study.The interview participants had an average of 3.5 years of experience in their role, 7.5 years with the organization, and 15.3 years as software engineers.

Interview.The original purpose of the interview was to identify which quality defects practitioners perceive in the requirements specifications that they process[5].Because the elicitation of quality defects entailed mentioning what kind of subsequent activity is affected by this defect, the generated data served to identify requirements-affected activities and their attributes.For example, stating that vague requirements lead to a delay of the testing phase contains the requirements-affected testing activity and its attribute duration.To guide the semi-structured interview, we developed a protocol.The protocol contained, among demographic questions, one prompt per type of requirements quality.The types of requirements quality were derived from Montgomery et al.[9] and covered, among others, ambiguity, completeness, and traceability.

Data Extraction.All eight one-hour-long interviews were recorded, automatically transcribed using a speech-to-text conversion tool,⁵⁵5https://www.descript.com/, and verified by the first author.Then, the first author extracted from the transcripts each mention of an activity affected by a requirements quality defect and how this effect was measured.The extraction produced 55 descriptions of affected activities but no descriptions of how this effect was measured on them.

III-A3 Literature Study

The third source of textual descriptions of requirements-affected activities and their attributes that we consider were descriptions of software process models.Software process literature describes processes and products of the SE life cycle and, hence, contains information about which activities are affected by requirements.Since software process literature is fairly mature[32], we have access to reliable summaries of process models.

Literature.We selected the book “Software Process Definition and Management” by Münch et al.[33] as a reliable summary of software process literature.The first author reviewed the descriptions of all seven lifecycle models, which cover the waterfall model[34], iterative enhancement[8], prototyping, the spiral model[35], the incremental commitment spiral model[36], Unified Process[37], and Cleanroom Development[38].The first author extracted all textual mentions of requirements-affected activities and their attributes as prescribed by the lifecycle model.This extraction produced 21 textual descriptions of activities and one explicit description of an attribute.

III-B Data Analysis

Coding.The data collection phase over the three sources culminated in a table containing 218 textual descriptions of requirements-affected activities and 356 textual descriptions of their attributes.In the absence of a prior theory or model of requirements-affected activities, we resorted to an inductive coding process[26].The first and third authors jointly established the level of granularity of the codes that were applied to the textual descriptions and documented this process in a guideline.The first author then performed the coding process independently and, upon completion, verified the assigned codes with the third author.For each pair of textual descriptions of an activity and attribute, we coded four concepts:

1.
Activity: the requirements-affected activity
2.
Activity attribute: a property evaluating an activity
3.
Artifact: an output artifact produced by the activity
4.
Artifact attribute: a property evaluating an artifact

The distinction of artifacts from activities was necessary since some activities were not evaluated directly but rather by the artifacts they produced.For example, duration is an attribute of the implementing activity, but several studies additionally evaluate that activity by measuring the coupling (artifact attribute) of the resulting source code (artifact).

Consolidation.The inductive coding process produced 24 unique codes for activities, 16 for activity attributes, 21 for artifacts, and 26 for artifact attributes.The first and third authors then created an abstraction hierarchy of identified activities and artifacts based on the guide to the software engineering body of knowledge[39].For example, both the planning and the estimating effort activities are sub-types of the more abstract managing activity[39].We decided to merge the activities interpreting and understanding with comprehending as none of the data sources sufficiently distinguished between them.Future studies differentiating them properly are necessary.

Once the hierarchy emerged, we associated each activity and artifact with the respective attributes that our data sources reported to characterize them.Whenever all activities or artifacts of a hierarchical group shared an attribute, we moved it to the higher-level activity or artifact for conciseness.Additionally, we made educated assumptions about the transferability of some attributes.For example, even though our data did not contain an instance of duration being evaluated on every activity, it is safe to assume that every activity can be characterized and evaluated by its duration.This step introduces slight subjectivity but improves the applicability of the model.

III-C Data Availability

To achieve the goals of usability and extensibility of the resulting model, we disseminate it via GitHub.⁶⁶6Available at https://github.com/JulianFrattini/gere-r3aThe repository contains a reference to all considered data sources, guidelines and protocols for the data extraction, and a specification of the current model of requirements-affected activities and their attributes.More importantly, it contains guidelines on how to contribute new or revise existing activities and attributes.Using the version control system of GitHub⁷⁷7https://docs.github.com/en/get-started/using-git/about-git we will foster a collaborative evolution of the model.

IV Results

IV-A Requirements-affected Activities and their Attributes

Measuring the Fitness-for-Purpose of Requirements: An initial Model of Activities and Attributes (2)

Figure2 visualizes the initial model of requirements-affected activities and their attributes.The model is structured like a UML class diagram and makes use of the inheritance relationship.An activity, represented as a UML class, that inherits from another activity also exhibits its attributes.For brevity, artifacts are excluded from the visualization.The replication package contains an extended model that includes the artifacts.The root of the inheritance tree is the abstract activity processing, which represents every executable activity.The model contains several activities that are commonly considered in research as requirements-affected activities, like modeling, prioritizing, implementing, and testing.Another prominent spot is taken by the merged activity comprehending, which dominates the distribution of activities among both experimental literature and interview statements.This correlates to the prominence of ambiguity among the attributes of requirements quality in empirical research[9] and is supported by the fact that this activity precedes every other activity[3].The model, furthermore, contains several less commonly investigated activities.For example, Murakami et al. investigate the activity of code review in which subjects are provided with a requirements specification[40].Consolidating larger sets of requirements to identify a semantically equivalent subset[41, 42] is another rare example.The model also contains activities that did not appear in experimental studies but were reported by interview participants or prescribed by software process literature.The activity of prototyping is such an example that was both mentioned during the interviews and as part of lifecycle models.Furthermore, the following activities were all named by interview participants but not considered in the experimentation literature:coordinating internal stakeholders based on a requirement, reusing artifacts like source code based on a new requirement, and estimating feasibility of a requirement.The attributes recorded in the model also show a varying distribution of prevalence.The most commonly encountered attributes of an activity are duration, correctness, and completeness.These represent both simple-to-measure and critical properties of most activities.Additionally, we observed several attributes related to the effect that the activity has on the executing agent, for example, how certain an agent feels when executing the activity, how easy, enjoyable, motivating, and useful they perceive it to be, and how learnable the activity was.Rarely mentioned attributes include how robust an activity is against errors and how biased an activity becomes given some controlled stimulation.

IV-B Implications

IV-B1 Implications for Research

The results contain multiple implications for requirements engineering and, specifically, requirements quality research.Firstly, the distribution of activities and attributes among the three data sources hints at potential research gaps.For example, the above mentioned activities of prototyping, coordinating, reusing, and estimating have not appeared in the sample of primary studies.Secondly, the model provides guidance for comprehensive measurements of the software development life cycle with respect to the impact of requirements artifacts and processes.As determined by Femmer et al., only a holistic view of all requirements-affected activities will reliably determine the impact of any treatment in requirements artifacts or processes[3].This affects all comparative studies in requirements engineering, i.e., all controlled and quasi-experiments aiming to evaluate the impact of a quality defect or the benefit of a new method.Only by measuring this impact on all requirements-affected activities in terms of their attributes and summarizing the total benefit or drawback, a holistic decision on the benefit or harm of any treatment can be made.While we certainly do not suggest that any comparative study from here on out must necessarily consider all 24 activities simultaneously, the model of requirements-affected activities provides at least a framework that allows integrating the results of multiple studies investigating the effect of the same treatment on different activities to one, overall conclusion.

IV-B2 Implications for Practice

The resulting initial model of requirements-affected activities and their attributes may serve practitioners as an overview of activities to measure when attempting to understand the fitness for purpose of their requirements.The model emphasizes the diversity of activities that may be affected by requirements but also the diversity of metrics by which they can be evaluated.While attributes like completeness, correctness, and duration are likely to be covered in key performance indicators of organizations, attributes like usefulness, ease of use, and learnability may often be neglected.Further practical use of the model for quantitative comparisons requires future work and will be discussed in SectionV.

IV-C Limitations

This study exhibits the following limitations.Firstly, the data extraction phase was only performed by one researcher.This introduces the possible risk that relevant information from the bodies of text was missing from the textual descriptions that were later coded.Secondly, the interview study was not performed with the research questions stated in SectionI in mind.Instead, the main theme of the interview study was centered around the broader scope of requirements quality[5].However, confirming previous studies that proposed that requirements quality inevitably depends on requirements-affected activities[3, 1, 2], the responses of interview participants naturally contained information that contributed to answering our research questions.Hence, we deem the interview data as an eligible data source for this study.Thirdly, every step of the study where we depart from purely summarizing and reporting data and instead interpret it introduces researchers’ bias.This is particularly evident in the conscious merging of the understanding, interpreting, and comprehending activity but also in the assumption about the transferability of several activities’ attributes.This step was necessary to elevate the model beyond a systematic summary toward an evaluation framework as demanded in previous research roadmaps[1, 2].We documented all interpretative steps and disclosed them in our replication package to allow other researchers to scrutinize these decisions.Finally, we address the threat to external validity.Full generalizability was out of the scope of the goals of this study, but we, nevertheless, briefly discuss all threats to external validity in order to justify the research plan as presented in SectionV.One threat to the generalizability stems from the sampling of the literature survey, which only considers a specific set of SE-relevant venues and categorically excludes workshops.Additionally, the literature review is limited to experiments and excludes other methods like case studies.Another threat stems from the sample of interview participants, which represent only one team of only one company.

V Research Plan

V-A Model Extension

The limitations mentioned in SectionIV-C necessitate the extension of the model to achieve goals 1 (applicability) and 2 (suitability) stated in SectionIII.Both the applicability and the suitability are inhibited by the potential incompleteness of the model.Hence, we plan to repeat the early method presented in SectionIII.Two immediately planned extensions are (1) repeating the systematic literature survey on workshop papers and (2) replicating the interview study in different companies and teams.Because of the extensive documentation of data collection methods for both empirical data (i.e., interview transcripts) and meta-research (i.e., primary studies), as well as the data analysis protocol, we anticipate that the model extension can be distributed well within our network of researchers interested in requirements-affected activities.

V-B Model Maintenance

Goals 3 (extensibility) and 4 (usability) stated in SectionIII are fulfilled by the design of the chosen dissemination strategy.The authors of this study will maintain the GitHub repository containing the current content and structure of the model.

V-C Model Validation

The most significant step of future work is to validate whether the model achieves the four goals stated in SectionIII.

Validating applicability.To test whether the model can represent all requirements-affected activities and attributes in any given SE context, we plan to conduct multiple case studies in different company contexts.Once the model is deemed sufficiently extensive, we trace requirements artifacts in each case company to every instance of reuse.The process of tracing requirements artifacts to activities using these artifacts as input shall happen both directly, i.e., by interviewing involved stakeholders, but also indirectly, i.e., by observing which stakeholder accesses the artifact and then following up on the purpose.The latter accounts for requirements-affected activities that stakeholders are not actively aware of, i.e., in case they unconsciously retrieve information to execute an activity without considering that this makes the activity requirements-affected.We constitute that the model achieves goal 1 if we do not encounter any requirements-affected activity that has no semantic equivalent in the model.

Validating suitability.To test whether the model can be used to evaluate relevant activities by means of their attributes, we plan to conduct an empirical study involving all surveyed case companies.Given the already detected requirements-affected activities, we evaluate these via the attributes associated with the activities in our model to quantify their performance.We aim to produce two types of empirical investigations from this data.Firstly, we aim to survey the activities and generate an overview of attribute values for all affected activities.This overview provides an absolute comparison of the activities and answers questions like “Which activity phase takes the longest time” or “Which development activity is perceived as the least enjoyable?”.Secondly, we aim to conduct quasi-experiments at the case companies investigating whether certain properties of requirements artifacts or properties have an impact.For example, the subject of the experiments could be the comparison between two types of template systems for requirements specification[43] or the avoidance of specific linguistic structures like passive voice[5].The subject of the experiments will be aligned with current questions and endeavors of the case companies to optimize their requirements engineering artifacts or process in an evidence-based manner.The results of the quasi-experiments will be measured in terms of differences in attribute values of all affected activities.This overview will provide companies with a summary of the effect that the proposed change has on all affected activities.We constitute that the model achieves goal 2 if the results generated by the surveys and quasi-experiments are accepted by the respective case companies.

Validating extensibility.To test whether the model can extended with new activities or attributes, aim to involve additional researchers in the model extension presented in SectionV-A.By distributing the task beyond the authors of this study, we determine how easily other researchers can extend the model.We constitute that the model achieves goal 3 if external researchers extend the model successfully.

Validating usability.To test whether the model can be accessed and comprehended by software engineers, we plan to facilitate external replications of the validation of goals 1 and 2.This not only validates whether the model achieves goal 4 but also extends the empirical evidence about the impact of requirements on affected activities in different company contexts.We constitute that the model achieves goal 4 if external researchers successfully replicate the empirical studies.

VI Conclusion

Requirements artifacts and processes fulfill a specific purpose in the software development lifecycle, that is, to inform subsequent activities about the needs and constraints imposed by stakeholders on the system under development[3].How fit requirements artifacts and processes are to fulfill their purpose, i.e., how well they benefit these requirements-affected activities, can be effectively determined when (1) all affected activities are known and (2) the performance of these activities can be evaluated.The need for a systematic overview of (1) requirements-affected activities as well as (2) the attributes which quantify their performance has been well recognized in requirements quality literature[3, 5] and evoked the call for a comprehensive model[1, 2].

We answer this call by proposing an initial model of requirements-affected activities and their attributes systematically derived from three distinct data sources.The model aims to support both researchers by guiding empirical studies concerning the impact of requirements artifacts and processes but also practitioners by offering an overview of attributes that may serve as key performance indicators of their requirements-affected activities.We envision that this model will be extended and evolved by the requirements engineering community to provide an applicable and suitable model for the task.We will actively maintain the presented resources to enable and foster this community endeavor.

Acknowledgment

This work was supported by the KKS foundation through the S.E.R.T. Research Profile project at Blekinge Institute of Technology.We further thank Parisa Yousefi and Charlotte Ljungman from Ericsson Karlskrona for facilitating the interview study.

References

[1]H.Femmer and A.Vogelsang, “Requirements quality is quality in use,” IEEE Software, vol.36, no.3, pp. 83–91, 2018.
[2]J.Frattini, L.Montgomery, J.Fischbach, D.Mendez, D.Fucci, and M.Unterkalmsteiner, “Requirements quality research: a harmonized theory, evaluation, and roadmap,” Requirements Engineering, pp. 1–14, 2023.
[3]H.Femmer, J.Mund, and D.M. Fernández, “It’s the activities, stupid! a new perspective on re quality,” in 2015 IEEE/ACM 2nd International Workshop on Requirements Engineering and Testing.IEEE, 2015, pp. 13–19.
[4]J.Frattini, L.Montgomery, J.Fischbach, M.Unterkalmsteiner, D.Mendez, and D.Fucci, “A live extensible ontology of quality factors for textual requirements,” in 2022 IEEE 30th International Requirements Engineering Conference (RE).IEEE, 2022, pp. 274–280.
[5]J.Frattini, “Identifying relevant factors of requirements quality: an industrial case study,” in Requirements Engineering: Foundation for Software Quality: 30th International Working Conference, REFSQ 2024, Winterthur, Switzerland, April 8–11, 2024, Proceedings 30.Springer, 2024.
[6]S.Wagner, D.M. Fernández, M.Felderer, A.Vetrò, M.Kalinowski, R.Wieringa, D.Pfahl, T.Conte, M.-T. Christiansson, D.Greer etal., “Status quo in requirements engineering: A theory and a global family of surveys,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol.28, no.2, pp. 1–48, 2019.
[7]D.M. Fernández, S.Wagner, M.Kalinowski, M.Felderer, P.Mafra, A.Vetrò, T.Conte, M.-T. Christiansson, D.Greer, C.Lassenius etal., “Naming the pain in requirements engineering: Contemporary problems, causes, and effects in practice,” Empirical software engineering, vol.22, pp. 2298–2338, 2017.
[8]B.W. Boehm, “Software engineering economics,” IEEE transactions on Software Engineering, no.1, pp. 4–21, 1984.
[9]L.Montgomery, D.Fucci, A.Bouraffa, L.Scholz, and W.Maalej, “Empirical research on requirements quality: a systematic mapping study,” Requirements Engineering, vol.27, no.2, pp. 183–209, 2022.
[10]L.Kof, “Treatment of passive voice and conjunctions in use case documents,” in Natural Language Processing and Information Systems: 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, Paris, France, June 27-29, 2007. Proceedings 12.Springer, 2007, pp. 181–192.
[11]G.Génova, J.M. Fuentes, J.Llorens, O.Hurtado, and V.Moreno, “A framework to measure and improve the quality of textual requirements,” Requirements engineering, vol.18, pp. 25–41, 2013.
[12]K.Pohl, Requirements engineering fundamentals: a study guide for the certified professional for requirements engineering exam-foundation level-IREB compliant.Rocky Nook, Inc., 2016.
[13]J.Krisch and F.Houdek, “The myth of bad passive voice and weak words an empirical investigation in the automotive industry,” in 2015 IEEE 23rd International Requirements Engineering Conference (RE).IEEE, 2015, pp. 344–351.
[14]J.Frattini, D.Fucci, R.Torkar, and D.Mendez, “A second look at the impact of passive voice requirements on domain modeling: Bayesian reanalysis of an experiment,” in 1st International Workshop on Methodological Issues with Empirical Studies in Software Engineering (WSESE2024).ACM, 2024.
[15]H.Femmer, J.Kučera, and A.Vetrò, “On the impact of passive voice requirements on domain modelling,” in Proceedings of the 8th ACM/IEEE international symposium on empirical software engineering and measurement, 2014, pp. 1–4.
[16]H.Femmer, “Requirements quality defect detection with the qualicen requirements scout.” in REFSQ Workshops, 2018.
[17]X.Franch, D.Mendez, A.Vogelsang, R.Heldal, E.Knauss, M.Oriol, G.Travassos, J.C. Carver, and T.Zimmermann, “How do practitioners perceive the relevance of requirements engineering research?” IEEE Transactions on Software Engineering, 2020.
[18]F.Ricca, G.Scanniello, M.Torchiano, G.Reggio, and E.Astesiano, “On the effectiveness of screen mockups in requirements engineering: results from an internal replication,” in Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement, 2010, pp. 1–10.
[19]M.I. Kamata and T.Tamai, “How does requirements quality relate to project success or failure?” in 15th IEEE International Requirements Engineering Conference (RE 2007).IEEE, 2007, pp. 69–78.
[20]D.Zowghi and N.Nurmuliani, “A study of the impact of requirements volatility on software project performance,” in Ninth Asia-Pacific Software Engineering Conference, 2002.IEEE, 2002, pp. 3–11.
[21]E.Knauss, C.ElBoustani, and T.Flohr, “Investigating the impact of software requirements specification quality on project success,” in Product-Focused Software Process Improvement: 10th International Conference, PROFES 2009, Oulu, Finland, June 15-17, 2009. Proceedings 10.Springer, 2009, pp. 28–42.
[22]K.Chari and M.Agrawal, “Impact of incorrect and new requirements on waterfall software project outcomes,” Empirical Software Engineering, vol.23, pp. 165–185, 2018.
[23]D.Damian and J.Chisan, “An empirical study of the complex relationships between requirements engineering processes and other processes that lead to payoffs in productivity, quality, and risk management,” IEEE Transactions on Software Engineering, vol.32, no.7, pp. 433–453, 2006.
[24]M.Borg, P.Runeson, and A.Ardö, “Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability,” Empirical Software Engineering, vol.19, pp. 1565–1616, 2014.
[25]S.Charalampidou, A.Ampatzoglou, E.Karountzos, and P.Avgeriou, “Empirical studies on software traceability: A mapping study,” Journal of Software: Evolution and Process, vol.33, no.2, p. e2294, 2021.
[26]D.S. Cruzes and T.Dyba, “Recommended steps for thematic synthesis in software engineering,” in 2011 international symposium on empirical software engineering and measurement.IEEE, 2011, pp. 275–284.
[27]D.I. Sjøberg, J.E. Hannay, O.Hansen, V.B. Kampenes, A.Karahasanovic, N.-K. Liborg, and A.C. Rekdal, “A survey of controlled experiments in software engineering,” IEEE transactions on software engineering, vol.31, no.9, pp. 733–753, 2005.
[28]E.M. Bennett, R.Alpert, and A.Goldstein, “Communications through limited-response questioning,” Public Opinion Quarterly, vol.18, no.3, pp. 303–308, 1954.
[29]G.C. Feng, “Mistakes and how to avoid mistakes in using intercoder reliability indices.” Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, vol.11, no.1, p.13, 2015.
[30]K.Moløkken and M.Jørgensen, “Expert estimation of web-development projects: are software professionals in technical roles more optimistic than those in non-technical roles?” Empirical Software Engineering, vol.10, pp. 7–30, 2005.
[31]J.Cohen, “A coefficient of agreement for nominal scales,” Educational and psychological measurement, vol.20, no.1, pp. 37–46, 1960.
[32]I.Sommerville, “Software process models,” ACM computing surveys (CSUR), vol.28, no.1, pp. 269–271, 1996.
[33]J.Münch, O.Armbrust, M.Kowalczyk, and M.Sotó, Software process definition and management.Springer, 2012.
[34]W.W. Royce, “Managing the development of large software systems: concepts and techniques,” in Proceedings of the 9th international conference on Software Engineering, 1987, pp. 328–338.
[35]B.W. Boehm, “A spiral model of software development and enhancement,” Computer, vol.21, no.5, pp. 61–72, 1988.
[36]B.Boehm and J.A. Lane, “Guide for using the incremental commitment model (icm) for systems engineering of dod projects,” usc-csse-2009-500, 2008.
[37]I.Jacobson, G.Booch, and J.Rumbaugh, “The unified software development process,” 1999.
[38]H.D. Mills, M.Dyer, and R.C. Linger, “Cleanroom software engineering,” 1987.
[39]P.Bourque and R.Fairley, “Swebok,” Nd: IEEE Computer society, 2004.
[40]Y.Murakami, M.Tsunoda, and H.Uwano, “Wap: Does reviewer age affect code review performance?” in 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE).IEEE, 2017, pp. 164–169.
[41]J.Nattoch Dag, T.Thelin, and B.Regnell, “An experiment on linguistic tool support for consolidation of requirements from multiple sources in market-driven product development,” Empirical Software Engineering, vol.11, pp. 303–329, 2006.
[42]K.Wnuk, M.Höst, and B.Regnell, “Replication of an experiment on linguistic tool support for consolidation of requirements from multiple sources,” Empirical Software Engineering, vol.17, pp. 305–344, 2012.
[43]K.Großer, M.Rukavitsyna, and J.Jürjens, “A comparative evaluation of requirement template systems,” in 2023 IEEE 31st International Requirements Engineering Conference (RE).IEEE, 2023, pp. 41–52.