Gain research experience in data science applied to wicked software engineering problems.

Find Out More Apply Now!

Science of Software (SoS) Research Experience for Undergraduates Summer 2017

The Science of Software REU Site at NC State University immerses a diverse group of undergraduates in a vibrant research community working on data science and software engineering. Research projects include visualization and data manipulation in virtual reality, model-based reasoning, human aspects (e.g., eye tracking data), bug fixing, and software analytics which is a cross between software and data mining methods.

Students will work besides research faculty to gain hands on experience with data science skills: machine learning, data engineering, statistics, and studying developer behavior through observation, interviews, surveys, biometrics, and data collection. REU students present their work at an undergraduate research competition, develop videos to convey their research to K-12 students, and are encouraged to submit their technical papers as peer reviewed publications.


Additional Project Details

Dr. Kathryn Stolee

Understanding Regular Expression Understandability. Most programmers use regular expressions when they code, and those same programmers often complain about them, too! Have you ever wondered why regular expressions are so hard to read and write? The goals of this project are to 1) explore the learning barriers programmers encounter when reading, writing, and fixing regular expressions, and 2) develop techniques to automatically repair broken regular expressions. You will get to learn about the basics of source code mining, source code analysis, and explore how programmers learn.

Crowdsourcing Quality. Crowdsourcing has become a common approach for companies, researchers, and individuals to acquire the opinions of hundreds, if not thousands of humans for very little money. However, current platforms fall short in their abilities to coerce the crowd into producing quality results. In this work, we will explore and evaluate techniques for crowd control when obtaining opinions on software models.

Dr. Tim Menzies

Reading is hard. Software engineers, researchers, spend thousands of hours a year pouring through documentation and research papers. We seek intelligent librarians to assist humans in navigating all that textual data.

In this project we explore and refactor the state-of-the-art text mining from both evidence-based medicine and legal electronic discovery to better support software engineers and software engineering researchers. Our current state-of-the-art is that we can "read" 10,000 papers by skimming just few dozen, then asking text miners to go find other potentially relevant papers. But those methods are very preliminary and much further work is required before we can ask a wider audience to use our tools. We have a software laboratory, called MAR (machine-assisted reading) in which we can perform extensive experiments on better ways to help people explore large sets of documents. MAR is not really "one" tool, rather it is a place where we can experiment with many tools. Think of it as a platform on top of which you can rapidly explore text mining methods.

So want to learn a lot about text mining and scientific experimentation? Then sign up for this project. For more on this work, see

Dr. Christopher Parnin
VersionBot. Software is built by composing together software libraries and other dependencies. By failing to update out-of-date dependencies, software can suffer from numerous security and buggy flaws. Developers are often occupied by needing to maintain core functionality and adding new features, and as a result, often ignore or delay updating dependencies. In our research, we have built software bots that can automatically upgrade software dependencies. However, some software dependencies can be more difficult to update than others. We wish to extend our approach to build an automatically created index of api version migrations that have occurried on Github and derive a "confidence score" about how safe or difficult it is to upgrade between versions. By examining changes to versions and associated changes, it would be possible to automatically rate the nature of client changes (renames, parameters, new code, configuration settings) and extent. Based on this confidence score, we'd be able to avoid attempting to make changes that might be unsuccessful.
Dr. Emerson Murphy-Hill

Developer Adaptive Modeling. Most software engineering tools assume that all software developers are the same. Bug trackers, code review tools, and static analysis tools look the same to every developer, even though some developers know more and some developers know less about the concepts those tools convey. In this project, we'll explore how those tools can adapt automatically to the developer looking at them. (link:

Gender Bias in Software Development. The creation of software is a nobel and rewarding persuit, but like all human endavors, those who participate are not immune to harmful biases. One of these biases is gender bias. Building on our prior work that looked at gender bias during pull requests on GitHub, in this project we'll explore other aspects of gender bias, including how people talk to one another and participant resliency. (link:

Error messages. Error messages produced by software engineering tools, from refactoring tools to profiling tools, are notoriously obscure. While these messages have improved significantly in the last 30 years, novices continue to find them difficult to understand, while experts find them obtuse. In this project we'll seek to better understand the challenges developers have, as well as design new messages that solve these challenges. (link:

Other projects

The projects you can work on are not limited to the ones above. You may find opportunities to contribute to many other projects during your participation in the program.


Dr. Christopher Parnin

Bio-sensing, machine learning, and data exploration in virtual reality.

Dr. Laurie Williams

Understanding best practices in privacy, security, and continuous deployment.

Dr. Tim Menzies

Data mining, process optimization, prediction.

Dr. Kathryn Stolee

program analysis, code search, empirical software engineering, and crowdsourcing

Dr. Emerson Murphy-Hill

Human factors in software engineering and developer tools.

Dr. Sarah Heckman

Educational tools for teaching software engineerings

Dr. Nicholas Kraft (ABB)

software maintenance & evolution, developer behavior & productivity.


Import Dates

  • Application Deadline: February 15th, 2017
  • Notification: March 1st, 2017 (early applications may receive early notification)
  • Move In/Out: May 28—29th/August 4th, 2017
  • Summer Program: Tuesday May 30th — Friday, August 4th, 2017
  • Symposium: Tuesday, August 1st, 2017


  • Work with active and experienced research faculty on emerging and socially relevant research topics.
  • Receive a stipend, housing, and parking on campus.
  • Students are eligible to apply for funding to travel to a conference.



  • Applicant must be a U.S. citizen or permanent resident of the United States.
  • Applicant must be and remain an undergraduate student in good standing.
  • Applicant must plan to complete a degree program.
  • Students must give a written report and oral presentation of work performed at the REU Site.
  • Students may participate in Computing REU programs at NC State a maximum of two summers.

Reference Letters

Please instruct your reference writers to upload their letter to this site.

This REU Site is sponsored by awards made through the NSF.

Previous Participants

Have questions?

Thinking about joining with us? That's great! Give us a call or send us an email and we will answer any questions you have.