I'm looking for a summer internship opportunity as a data scientiest or software researcher/engineer.

PhD CS North Carolina State University May, 2013 - Dec, 2017 (expected)
MS EE Beijing University of Posts & Telecomm Sep, 2009 - Feb, 2012
BS EE Nanjing University of Technology Sep, 2005 - Jun, 2009
Intern, ABB USCRC, Software Engineering Group, Raleigh, USA May, 2016 - Aug, 2016
  • Cleaned and visualized historical software development data across all teams in ABB.
  • Applied data mining techniques to build predictive models and help improve software development within ABB.
  • Explored software engineering research questions based on proprietary data.
  • Visiting student, Tsinghua University, Beijing, CHINA May, 2012 - May, 2013
  • Designed a null-space based robust interference avoiding strategy for D2D communication.
  • Investigated interference issues in small-cell communicaiton networks.
  • Intern, China Unicom Design Institute co., LTD, Beijing, CHINA Mar, 2011 - Oct, 2011
  • Standards Research on Relay technique in LTE-Advanced System.
  • Conducted independent research into the relay technology of a network physical layer, principally on the performance analysis of a relay network combined with network coding.
  • Evolutionary Approaches vs. Grid Search Feb, 2016 - Apr, 2016

    Grid search is the de facto parameter tuning tool, which is available in many data mining software like Weka, R, and Scikit-learn. Previous results show that grid search takes extremely long time to run and suffers from ``curse-of-dimensionality''. In this project, we find that for software analytics, such as defect prediction:

  • Evolutionary algorithm like differential evolution does no worse than grid search
  • but requires very few evaluations as compared to grid search.
  • Transfer Learning in Software Engineering May, 2015 - Nov, 2015

    The goal of the research is to enable software engineers to find software development best practices from past empirical data. For some new projects or small companies, historical data may not be available. This project proposes to use defect data from different software projects or even different companies to build a defect prediction model by transfer learning methods.

  • Three feature matching techniques are proposed to select data with the ``best'' features from different projects to form a local training data, and transfer such knowledge to build predictive models for the new proejcts.
  • Further, instead of using all available data to train learners, we find small sampling data is good enough to perform transfer learning. This makes a lot sense since we don't need to wait to collect all historical data and it can speed up the transfer learning process
  • Parameter Tuning for Defect Prediction Sep, 2014 - Aug, 2015

    One of the ``black arts'' of data mining is setting the tuning parameters that control the miner. By using searching algorithms like Differential Evolutionary(DE), we offer a simple, automatic, and very effective method for finding those tunings for software defect predictors. We find that

  • DE tuning improves the performance of the defect predictor,for exmaple, it can alter detection F-Measure from 12% to 78%.
  • DE tuning changes conclusions on what learners are better than others
  • DE tuning changes conclusions about what factors are most important in software engineering
  • DE tuning is not impractically slow, which usually requires less than 100 evaluations.
  • Build a Continuous Delivery Pipeline from Scratch Aug, 2016 - Dec, 2016
  • BUILD: a component that automatically created a build server, which is capable of building a target project inresponse to commit events, and trigger a post-build task; track and display a history of past builds via http.
  • TEST: a component that can generate test cases, run unit tests, fuzzing tests, advanced fuzzing tests with genetic algorithms.
  • ANALYSIS: a component that run existing static analysis tools, like Jlint, to measure coverage and do code analysis.
  • DEPLOY: a component that has the ability to configure production environment automatically, deploy the application, monitor the deployed application, auto-scale components of production, perform canary release.
  • Centralized P2P File Sharing System Feb, 2014 - Apr, 2014

    Implemented a simple Peer-to-Peer system with a centralized index, in which a concurrent server that is capable of carrying out communicatioin with multiple clinets simultaneously maintains the peer and files lists accrodingly when peers join and leave randomly. When peers join in this P2P system, they can add the local file names to file list in the server, and download files from some destination peer over TCP protocol after looking up specific files from the server.

    A Simple FTP File Sharing System Mar, 2014 - Apr, 2014

    Using the Go-back-N automatic repeat request scheme, a simple FTP in a client -server architecture was implemented. Specifically, encapsulate application data into transport layer segments by including transport hearders, bufferfing and manage data received from , or to be delieverd to, the applciation using UDP socket interface; adjust the window size at the sender, and compute checksums.

    Intelligent Patrol Monitoring System Mar, 2010 - Dec, 2010
    • Designed the big picture and structure of this B/S system.
    • Designed and developed map display module to display the real-time or previous positions of GPS terminals and implement other geographical operations for users in this system.
    • Designed and developed server push module whichh is responsible for automatically transmitting the data received from GPS terminals to client PCs that logged in this sytem.
    • Designed and developed communication package to send and receive data between the GPS terminals and the system server by SMS service and GPRS network.

    • [J3] JC. Nam, W. Fu, S. Kim, T. Menzies, L. Tan ``Heterogeneous Defect Prediction'', Transactions on software engineering, IEEE, 2015 (2nd round review).
    • [J2] W. Fu,V. Nair T. Menzies ``Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors?'', Information ans Software Technology submitted, Sep 8. PDF
    • [J1] W. Fu, T. Menzies, X. Shen ``Tuning for Software Analytics: is it Really Necessary?'', Information ans Software Technology76 (2016): 135-146. PDF
    • [C4] A. Agrawal, W. Fu, and T. Menzies. "What is wrong with topic modeling?(and how to fix it using search-based se)." arXiv preprint arXiv:1608.08176 (2016) (submitted to ICSE 2017). PDF
    • [C3] R. Krishna, T. Menzies, and W. Fu. "Too much automation? the bellwether effect and its implications for transfer learning." Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 2016. PDF
    • [C2] W. Fu, R. C. Yao, F. Gao, J.C.F. Li, and M. Lei,``Robust Null-Space Based Interference Avoiding Scheme for D2D Communication Underlaying Cellular Networks,'' IEEE WCNC, Shanghai, China, Apr. 2013. PDF
    • [C1] X. Wang, C. W. Yuan, W. Fu, ``A Semi-anonymous Non-contact Offline Mobile Payment Protocol based on Concurrent Signature,'' ICMAM, Hongkong, China, Dec. 2011. PDF

    • F. Gao, W. Fu, J.C.F. Li, and M. Lei, ``Null-space Based Robust Interference Mitigation Method for Multiple-antenna D2D Communication System,'' 2013. LINK