Dave Binkley's Research Interests

Software Engineering and Testing
Software Recommendation
Information Retrieval Techniques in Software Engineering
Program Slicing and Clustering
Software for Safety Critical Systems
(see also the HISSA home page)
Publications
return to Dave Binkley's Home Page

Software Engineering and Testing

My present work on software engineering includes clone detection, and improving genetic algorithms for test-case generation. Code clones represents identical or nearly identical parts of a program. While their removal is not always advantageous, an understanding of the clones in a program is. This work has recently expanded to consider clones in visual languages such as Max/MSP. Random test-case generation can provide surprising coverage. However, the remaining parts of the code can be rather difficult to find test data for. Genetic algorithms are one promising approach to covering these hard-to-cover parts of the code. Genetic algorithms suffer from situation in which the local landscape provides little guidance. Thus my work with techniques for landscape improvement is aimed at improving test-case generation. The work below also includes older work on semantics-based SE tools, regression testing, largely conducted at NIST where we examined the influences of technology transfer of research ideas to industry.

Relevant Publications

From Neuron Coverage to Steering Angle: Testing Autonomous Vehicles Effectively
An Investigation into the Effect of Control and Data Dependence Chain Length on Predicate Testability
Service Evolution Analytics: Change and Evolution Mining of a Distributed System
An Exploratory Study of the Relationship Between Software Test Smells and Fault-Proneness
On Adaptive Change Recommendation
Web Service Slicing: Intra and Inter-Operational Analysis to Test Changes
Agile and other trends in software engineering
Practical Guidelines for Change Recommendation using Association Rule Mining
Uncovering Dependence Clusters and Linchpin Functions
Are Test Smells Really Harmful? An Empirical Study
Which feature location technique is better?
Recovering test-to-code traceability using slicing and textual analysis
Evaluating test-to-code traceability recovery methods through controlled experiments
An Empirical Analysis of the Distribution of Unit Test Smells and Their Impact on Software Maintenance
FlagRemover: A testability transformation for transforming loop-assigned flags
SCOTCH: Slicing and coupling based test to code trace hunter
SCOTCH: Test-to-Code Traceability using Slicing and Conceptual Coupling
An Eclipse plug-in for Test-to-Code Traceability Recovery
Model Projection: Simplifying Models in Response to Restricting the Environment
Clone Detection for Max/MSP Patch Libraries
Cloning and Copying between GNOME Projects
Subclass Instantiation Distribution
Distinguishing Copies from Originals in Software Clones
Issues in Clone Classification for Dataflow Languages
Assessing the Impact of Global Variables on Program Dependence and Dependence Clusters
Dependence Clusters in Source Code
Identifying `Linchpin Vertices' that Cause Large Dependence Clusters
Coherent Dependence Clusters
To CamelCase or Under_score
Dependence Anti Patterns
Evaluating Key Statements Analysis
KClone: A Proposed Approach to Fast Precise Code Clone Detection
Testability Transformation - Program Transformation to Improve Testability
An Investigation of Hierarchical Bit Vectors
An Empirical Study of the Relationship Between the Concepts Expressed in Source Code and Dependence
Dependence Cluster Causes
Impact of Limited Memory Resources
An Empirical Study of Slice-Based Cohesion and Coupling Metrics
Software Fault Prediction using Language Processing
Source Code Analysis: A Road Map
Automated Pointcut Extraction
Tool-supported Refactoring of Existing Object-Oriented Code into Aspects
Characterizing, Explaining, and Exploiting the Approximate Nature of Static Analysis through Animation
The Species per Path Approach to Search-Based Test Data Generation
Locating Dependence Clusters and Dependence Pollution
Testability Transformation for Efficient Automated Test Data Search in the Presence of Nesting
Automated Refactoring of Object Oriented Code into Aspects
Slice-Based Cohesion Metrics and Software Intervention
Analysis and Visualization of Predicate Dependence on Formal Parameters and Global Variables
Experimental validation of new software technologies
A Longitudinal and Comparative Study of Slice-Based Metrics
Amorphous Procedure Extraction
Evolutionary Testing in the Presence of Loop-Assigned Flags: A Testability Transformation Approach
An Empirical Study of Computation Equivalence as Determined by Decomposition Slice Equivalence
An Empirical Study of Amorphous Slicing as a Program Comprehension Support Tool
Results from a Large-Scale Study of Performance Optimization Techniques for Source Code Analysis Based on Graph Reachability Algorithms
An Empirical Study of Predicate Dependence Levels and Trends
An Empirical Study of the Effect of Semantic Differences on Programmer Comprehension
An Implementation of and Experiment with Semantic Differencing
Flow Insensitive Points-to Sets (workshop version)
The Application of Program Slicing to Regression Testing
Experimental Validation of New Software Technologies
Culture Conflicts in Software Engineering Technology Transfer
Semantics Guided Regression Test Cost Reduction
Reducing the Cost of Regression Testing by Semantics Guided Test Case Selection
Using Semantic Differencing to Reduce the Cost of Regression Testing

Software Recommendation

My work on software recommendation aims to conceive novel software recommendation technology aimed at improving the construction of large software systems including software product families, which are some of the largest software systems ever constructed. The current work focuses on association rule mining, an unsupervised learning technique that infers relationships among items in a data set. Association rule mining is used to produce evolutionary coupling that can then be used to recommend files potentially missed by an engineer.

Relevant Publications

On Adaptive Change Recommendation
The Case for Adaptive Change Recommendation
What are the effects of history length and age on mining software change impact?
Aggregating Association Rules to Improve Change Recommendation
Predicting relevance of change recommendations
Practical Guidelines for Change Recommendation using Association Rule Mining
Generalizing the Analysis of Evolutionary Coupling for Software Change Impact Analysis
Improving Change Recommendation using Aggregated Association Rules
Exploring the Effects of History Length and Age on Mining Software Change Impact

Semantics-based Software Engineering Tools including Information Retrieval in Software Engineering

The big picture view of my research program is a focus on improving software engineering tools through the use of program semantics. The growing base of installed legacy source code (large programs whose authors are often unknown), combined with the increased complexity of modern software, makes it increasingly important for a software engineer to have good tool support. Tools assist in the construction of new programs, the understanding and modification of existing programs, and the verification and validation of both. A tool that exploits the underlying semantics of a program is more likely to provide useful high-level information. Traditionally my research in this ares has focused on exploiting meaningful programming-language semantic information Interestingly, this work has recently broadened to incorporated natural-language semantics through the application of Information Retrieval (IR) inspired techniques. This is done by processing the text from the source code and other software artifacts using existing IR algorithms and newly developed IR-based algorithms.

Relevant Publications

Featherweight Assisted Vulnerability Discovery
On the Value of Bug Reports for Retrieval-Based Bug Localization
Entropy as a lens into LDA model understanding
The need for software specific natural language techniques
The impact of vocabulary normalization
Source code analysis with LDA
A Case for Software Specific Natural Language Techniques
Navigating Source Code with Words
Enabling Improved IR-based Feature Location
Learning to Rank Improves IR in SE
Understanding LDA in Source Code Analysis
Task-driven software summarization
A Dataset for Evaluating Identifier Splitters
Vocabulary Normalization Improves IR-Based Concept Location.
An Empirical Analysis of the Distribution of Unit Test Smells and Their Impact on Software Maintenance
The impact of identifier style on effort and comprehension
Expanding Identifiers to Normalize Source Code Vocabulary
Information Retrieval Applications in Software Development
Information Retrieval Applications in Maintenance and Evolution
Improving Identifier Informativeness using Part of Speech Information
Normalizing Source Code Vocabulary
Increasing Diversity: Natural Language Measures for Software Fault Prediction
Structure Field Categorization by Software Engineers
Developer categorization of data structure fields
An Empirical Study of Rules for Well-Formed Identifiers
Quantifying Identifier Quality: an Analysis of Trends
Effective Identifier Names for Comprehension and Memory
Software Fault Prediction using Language Processing
Extracting Meaning from Abbreviated Identifiers
Source Code Analysis: A Road Map
Syntactic Identifier Conciseness and Consistency
An Empirical Comparison of Techniques for Extracting Concept Abbreviations from Identifiers
Leveraged Quality Assessment using Information Retrieval Techniques
What's in a Name? A Study of Identifiers

Program Slicing and Clustering

The slice of a program with respect to a set of program elements S is a projection of the program that includes only program elements that might affect (either directly or transitively) the values of the variables used at members of S. Slicing allows one to find semantically meaningful decompositions of programs, where the decompositions consist of elements that are not textually contiguous. For example, slicing allows a tax computation to be extracted from a mortgage payment program. My present work on program slicing concentrates on the impact of dependence clusters and slicing techniques for Extended finite state machines (EFSMs), which are used in an increasing number of applications and tools. Clusters are set of statements that all mutually depend upon the others. Larger dependence clusters interfere with the work of programmers and tools. Thus a better understanding of the makeup of dependence clusters and even techniques for breaking them should improve programmer and tool performance. While it is possible to naively translate program slicing technique to EFSMs, better approaches more natively treat the features of the EFSM.

Relevant Publications

Dynamic Slicing of WebAssembly Binaries.
Assessing the Impact of Execution Environment on Observation-Based Slicing
Static Stack-Preserving Intra-Procedural Slicing of WebAssembly Binaries
An empirical evaluation of quasi-static executable slices
QSES: Quasi-Static Executable Slices
Observation-based Approximate Dependency Modeling and its Use for Program Slicing.
Evaluating Lexical Approximation of Program Dependence
A Comparison of Tree- and Line-Oriented Observational Slicing
MOAD: Modeling Observation-based Approximate Dependency
Web Service Slicing: Intra and Inter-Operational Analysis to Test Changes,
Tree-Oriented vs. Line-Oriented Observation-Based Slicing
Generalized Observational Slicing for Tree-Represented Modelling Languages
Observational slicing based on visual semantics
PORBS: A Parallel Observation-based Slicer
An Empirical Study on Dependence Clusters for Effort-Aware Fault-Proneness Prediction
ORBS and the Limits of Static Slicing
Seeing is Slicing: Observation Based Slicing of Picture Description Languages
ORBS: Language-Independent Program Slicing
Efficient Identification of Linchpin Vertices in Dependence Clusters
Coherent Clusters in Source Code
A Trajectory-based Strict Semantics for Program Slicing
Dependence Cluster Visualization
Assessing the Impact of Global Variables on Program Dependence and Dependence Clusters
Dependence Clusters in Source Code
Identifying `Linchpin Vertices' that Cause Large Dependence Clusters
Coherent Dependence Clusters
(FOSM) Program Slicing
An Empirical Study of the Relationship Between the Concepts Expressed in Source Code and Dependence
Dependence Anti Patterns
Evaluating Key Statements Analysis
Empirical Study of Optimization Techniques for Massive Slicing
An Empirical Study of Slice-Based Cohesion and Coupling Metrics
An Empirical Study of Static Program Slice Size
An Empirical Study of Executable Concept Slice Size
Stop-List Slicing
A Formalisation of the Relationship between Forms of Program Slicing
Theoretical Foundations of Dynamic Program Slicing
A Formal Relationship Between Program Slicing and Partial Evaluation
Unifying Program Slicing and Concept Assignment for Higher-Level Executable Source Code Extraction
A Framework for Static Slicers of Unstructured Programs
Forward slices are smaller than backward slices
Minimal Slicing and the Relationships Between Forms of Slicing
Loop Squashing Transformations for Amorphous Slicing
Syntax-Directed Amorphous Slicing
Amorphous Program Slicing
A Survey of Empirical Results on Program Slicing.
Formalizing Executable Dynamic and Forward Slicing
Large-Scale Empirical Study of Forward and Backward Static Slice Size and Context Sensitivity
An Empirical Study of Predicate Dependence Levels and Trends
Code Extraction Algorithms Which Unify Slicing and Concept Assignment
An Empirical Study of the Effect of Semantic Differences on Programmer Comprehension
An Implementation of and Experiment with Semantic Differencing
Flow Insensitive Points-to Sets (workshop version)
Program Simplification as a Means of Approximating Undecidable Propositions
Computing Amorphous Program Slices using Dependence Graphs
The Surgeon's Assistant
The Application of Program Slicing to Regression Testing
Application of the Pointer State Subgraph to Static Program Slicing
Program Slicing
Program Integration for Languages with Procedure Calls
Slicing in the Presence of Parameter Aliasing
Slicing in the Presence of Pointers
Precise Executable Interprocedural Slices
Interprocedural Slicing Using Dependence Graphs
Also appeared in Software Merging and Slicing
and Software Change Impact Analysis
Identifying Semantic Differences in Programs with Procedures
Multi-procedure Program Integration
The Multi-Procedure Equivalence Theorem
Interprocedural Slicing using Dependence Graphs

Software for Safety Critical Systems (NIST)

My interest in safety critical systems has focused on the use of tools for producing high integrity software and in particular tools for producing safe C++ software. High integrity software is software that can and must be trusted to work dependably in critical applications (e.g., software in safety systems of nuclear power plants, medical devices, electronic banking, air traffic control, automated manufacturing, and some business systems).

Relevant Publications

An Empirical Study of the Effect of Semantic Differences on Programmer Comprehension
Application of the Pointer State Subgraph to Static Program Slicing
C++ in Safety Critical Systems
A CASE Tool to Evaluate Functional Diversity in High Integrity Software

Publications

Reprinted in Collections

The paper PLDI88 was selected for inclusion in a special SIGPLAN collection of the 50 most influential papers from the SIGPLAN Conference on Programming Language Design and Implementation from 1979 to 1999:

S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. In 20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation (1979 - 1999): A Selection, K.S. McKinley, ed., ACM SIGPLAN Notices 39, 4 (April 2004), 232-243.

A retrospective on the paper was published as

S. Horwitz, T. Reps, and D. Binkley. Retrospective: Interprocedural slicing using dependence graphs. In 20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation (1979 - 1999): A Selection, K.S. McKinley, ed., ACM SIGPLAN Notices 39, 4 (April 2004), 229-231. retrospective.pdf

S. Horwitz, T. Reps, D. Binkley . Interprocedural slicing using dependence graphs. In Software Change Impact Analysis, Shawn A. Bohner and Robert S. Arnold, IEEE Computer Society, Los Alamitos, CA.
Reprinted from ACM Transactions on Programming Languages and Systems 12, 1 (January 1990), pp. 26-60. toplas90.ps
S. Horwitz, T. Reps, D. Binkley . Interprocedural slicing using dependence graphs. In Software Merging and Slicing, V. Berzins (ed.), IEEE Computer Society, Los Alamitos, CA, pp. 10-44.
Reprinted from ACM Transactions on Programming Languages and Systems 12, 1 (January 1990), pp. 26-60. toplas90.ps

Other Publications and Reports

C. Uehlinger, D. Binkley, and D. Lawrie Vocabulary Normalization's Impact on IR-Based Concept Location Technical Report LOY20625. tr-loy206520.pdf

D. Binkley., M. Davis, D. Lawrie, J. I. Maletic, C. Morrell, and B. Sharif. Extended models on the impact of identifier style on effort and comprehension. Technical Report LOY11072J. tr-loy110720.pdf

D. Jones, D. Binkley, and D. Lawrie Developer categorization of data structure fields. CVu Journal 21(1), March 2009. C Vu Issue 21(1)
D. Binkley Dependence Cluster Causes. Dagstuhl Seminar 05451 "Beyond Program Slicing" Internationales Begegnungs- und Forschungszentrum fuer Informatik (IBFI), Schloss Dagstuhl, Germany. August, 2008. Dagstuhl cluster busting
D. Binkley, M. Harman, and J. Krinke, editors. Dagstuhl Seminar 05451 "Beyond Program Slicing" Internationales Begegnungs- und Forschungszentrum fuer Informatik (IBFI), Schloss Dagstuhl, Germany. March, 2006. ISSN 1862-4405 Dagstuhl Beyond Program Slicing
D. Binkley. C++ in Safety Critical Systems. IR-5769. U.S. Department of Commerce, Technology Administration, National Institute of Standards and Technology, Computer Systems Laboratory, Gaithersburg, MD, (1996). NIST IR 5769

J. Lyle, D. Wallace, J. Graham, K. Gallagher, J. Poole, D. Binkley. A CASE Tool to Evaluate Functional Diversity in High Integrity Software. IR-5691. U.S. Department of Commerce, Technology Administration, National Institute of Standards and Technology, Computer Systems Laboratory, Gaithersburg, MD, (1995). NIST IR 5691

B. Kuhn, D. Binkley. In search of a Customizable and Uniform User Interface. Crossroads, The International ACM Student Magazine. Issue 1.2, December 1994. crossroads94.html

D. Binkley, S. Horwitz, T. Reps. Identifying Semantic Differences in Programs with Procedures. (Extended abstract). Computer Sciences Department, University of Wisconsin-Madison, September 1991.

D. Binkley. Multi-Procedure Program Integration. Ph.D. dissertation and Tech. Rep. TR-1038, Computer Sciences Department, University of Wisconsin, Madison, WI, August 1991. UW-TR1038.html thesis.ps.gz

D. Binkley, S. Horwitz, T. Reps. The Multi-Procedure Equivalence Theorem. TR-890, Computer Sciences Department, University of Wisconsin-Madison, November 1989. UW-TR890.html UW-TR890.html UW-TR-890.ps.gz

Patents

T. Reps, S. Horwitz, D. Binkley. Interprocedural slicing of computer programs using dependence graphs. U.S. Patent Number 5,161,216, November 3, 1992.

Students

Undergraduate

Jake Norris Class of 2020.
Victoria Matos Class of 2020.
Sydney Pugh Class of 2018.
Billy Pinti Class of 2014.
Jen Ellis Class of 2014.
Austin Wheeler Class of 2013.
Justin Overfelt Class of 2013.
Stephanie Lukin Class of 2011.
Amy Wheeler Class of 2011.
Steve Maex Class of 2011.
Greg Livingston Class of 2010.
Matt Hearn Class of 2009.
Henry Feild Class of 2007.
Kyle Sluder Class of 2007.
Christian Wolfe Class of 2007.
Dominic Cerquetti Class of 2006.
Tim Meyers Class of 2005.
Ross Raszewski Class of 2000.
Chris Smith Class of 2000.
Rob Capellini Class of 1999.
April Wielgosz Class of 1998.
Brennan Jubb Class of 1998.
Alex Clark Class of 1998.
Bruce Duncan Class of 1997.
Brad Kuhn Class of 1995.
Graduate

Tom Way, Procedure Restructuring for Ambitious Optimization. Department of Computer and Information Sciences, University of Delaware, Newark, DE. (Outside Ph.D. committee member) Ph.D. Awarded April 2002
Cheer-Sun D. Yang, Program-Based Structural Testing of Explicitly Parallel Programs. Department of Computer and Information Sciences, University of Delaware, Newark, DE. (Outside Ph.D. committee member) Ph.D. Awarded May 1999.
James B. Fenwick, Compiler Analysis and Optimization of Linda Programs for Distributed-Memory Systems. Department of Computer and Information Sciences, University of Delaware, Newark, DE. (Outside Ph.D. committee member) Ph.D. Awarded April 1998.
Jody Sevidal, Persistent Storage in Object-Oriented Databases. MS awarded August 1995.
Wael Gamal, Data Usage Abstraction in Structured Programs. Department of Mathematics, Cairo University, Cairo Egypt. (Outside M.Sc. committee member)

You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the Computer Science Department at Loyola College Maryland under terms that include this permission. All other rights are reserved by the author(s) and copyright holders.

Dave Binkley's Research Interests

Contents

Software Engineering and Testing

Relevant Publications

Software Recommendation

Relevant Publications

Semantics-based Software Engineering Tools including Information Retrieval in Software Engineering

Relevant Publications

Program Slicing and Clustering

Relevant Publications

Software for Safety Critical Systems (NIST)

Relevant Publications

Publications

Journal Publications

Book Chapters

Conference Publications

Reprinted in Collections

Other Publications and Reports

Patents

Students

Undergraduate

Graduate