Papers by David J . Malan
For CS50 at Harvard, we have developed a suite of free, open-source tools to help students with w... more For CS50 at Harvard, we have developed a suite of free, open-source tools to help students with writing, testing, and submitting programming assignments and to help teachers grade those assignments and check them for similarities. help50 parses often-cryptic error messages and explains them in beginner-friendly terms. check50 runs a set of automated tests on students' code, providing feedback on errors. style50 lints students' code, highlighting that don't adhere to the course's style guide. submit50 allows students to submit assignments to a GitHub repository, without students needing to have knowledge of git or version control themselves. And compare50 allows teachers to analyze submissions for similarity, looking for pairs or clusters of submissions that might be the result of improper collaboration. In this workshop, we'll introduce each of these tools and discuss how other teachers can use them in their own classrooms. Along the way, we'll discuss how to use the tools effectively, compare and contrast them with alternatives, identify how the tools have changed students' behavior for the better and for worse, and highlight pedagogical and technological changes we've made to redress the latter.
Botnets allow adversaries to wage attacks on unprecedented scales at unprecedented rates, motivat... more Botnets allow adversaries to wage attacks on unprecedented scales at unprecedented rates, motivation for which is no longer just malice but profits instead. The longer botnets go undetected, the higher those profits. I present in this thesis an architecture that leverages collaborative networks of peers in order to detect bots across the same. Not only is this architecture both automated and rapid, it is also high in true positives and low in false positives. Moreover, it accepts as realities insecurities in today's systems, tolerating bugs, complexity, monocultures, and interconnectivity alike. This architecture embodies my own definition of anomalous behavior: I say a system's behavior is anomalous if it correlates all too well with other networked, but otherwise independent, systems' behavior. I provide empirical validation that collaborative detection of bots can indeed work. I validate my ideas in both simulation and the wild. Through simulations with traces of 9 variants of worms and 25 non-worms, I find that two peers, upon exchanging summaries of system calls recently executed, can decide that they are, more likely than not, both executing the same worm as often as 97% of the time. I deploy an actual prototype of my architecture to a network of 29 systems with which I monitor and analyze 10,776 processes, inclusive of 511 unique non-worms (873 if unique versions constitute unique non-worms). Using that data, I expose the utility of temporal consistency (similarity over time in worms' and non-worms' invocations of system calls) in collaborative detection. I identify properties with which to distinguish non-worms from worms 99% of the time. I find that a collaborative network, using patterns of system calls and simple heuristics, can detect worms running on multiple hosts. And I find that collaboration among peers significantly reduces the risk of false positives because of the unlikely, simultaneous appearance across peers of non-worm processes with worm-like properties.
Proceedings of the 53rd ACM Technical Symposium on Computer Science Education V. 2, Mar 3, 2022
Odds are we've all used (or tried!) quite a few tools to facilitate efficiency inside and out... more Odds are we've all used (or tried!) quite a few tools to facilitate efficiency inside and outside of the classroom and empower students to learn more effectively, whether on campus or off. Some of those tools are perhaps homegrown and unique to one's own institution, but freely available educational technologies abound as well, some in the cloud, some for Macs and PCs, some open-source. And quite a few commercial tools offer free or discounted educational plans as well. In this BoF, we'll begin with a whirlwind tour of the tools we ourselves use, identifying the problems they solve and how well, then quickly open the floor to everyone to share their favorites as well. Along the way, we'll jot down every tool mentioned and share the results. With educational technology an evergreen landscape, this year's list will surely be different from last! Attendees should exit this session with a better understanding of the current landscape, familiarized with innovations they can bring back to their own classes (whether high school, undergraduate, or graduate), without reinventing wheels themselves.
Behaviour & Information Technology, Jan 6, 2020
Massive open online courses (MOOCs) show highly irregular participation behaviour among users. In... more Massive open online courses (MOOCs) show highly irregular participation behaviour among users. In this study, using data from Computer Science 50x of HarvardX, we investigated one extreme, yet common strategy to foresee the endgame: taking the final problem set at the beginning of the course. We found such a strategy to be the only dominant trajectory alternative to following the sequence prescribed by the syllabus. Whereas all students who took and passed the final problem set at the beginning of the course subsequently completed the course, those who took and failed the final problem set at the beginning of the course finished the fewest number of milestones, even fewer than those who never attempted the final problem set. Moreover, students with a lower prior programming proficiency were more likely than better prepared students both to take the final problem set early and to fail it. This study revealed the disconcerting phenomenon that many students dropped out of a MOOC because, apparently, their confidence was crushed even before they learned any course content. The study suggests that future MOOC practices and policies should offer informative and constructive syllabi to accommodate students' need for previewing the endgame.
This workshop introduces participants to CS50 IDE (cs50.io), a web-based integrated development e... more This workshop introduces participants to CS50 IDE (cs50.io), a web-based integrated development environment based on Cloud9 (c9.io). Not only does the IDE enable students to work on programming projects within a browser, without need for local downloads or installations, it also provides students with an integrated terminal window and full sudo privileges. Underneath the hood is a Docker "container" that allows students to experiment with the underlying Ubuntu Linux OS, installing and configuring software at will, adapting it to their particular projects' needs. The IDE supports any compiler, interpreter, or other software that can be installed via a Linux command-line, while the IDE itself provides a fully-featured text editor for text files and source code that reside on the underlying instance. The Cloud9 GUI is fully extensible through a plugin system and is leveraged by CS50 IDE to provide additional functionality for students. Among the additional features implemented through this mechanism are a GUI-based file submission system, an optional "less comfortable" mode that simplifies the GUI to provide a scaffolded experience for students new to programming, and a GUI front end for the GNU Project Debugger, a CLI debugger for many languages, including C. This workshop will highlight useful features of the IDE in the context of classrooms (including the collaborative nature of a workspace to allow pair programming or provide alternative one-on-one instruction), provide tips for writing or adapting assignments based on its architecture, and introduce developing plugins for full customization.
Current distributed sensor network platforms lack comprehensive lowpower routing techniques and e... more Current distributed sensor network platforms lack comprehensive lowpower routing techniques and efficient public key cryptography mechanisms. Reducing power for individual radio transmissions has not been explored sufficiently. Popular sensor node platforms do not include a mechanism for distributing and redistributing shared cryptographic keys among nodes. This paper discusses a technique to tailor node transmit power to the lowest practical level while maintaining reliable network links and presents the first known implementation of elliptic curve cryptography for sensor networks. Results demonstrate that dynamic radio output power scaling is effective in reducing node power consumption by orders of magnitude in certain scenarios. Analysis suggests that secret-key cryptography is already viable on the UC Berkeley MICA2 mote and public-key infrastructure may also be tractable despite the device's limited memory.
Journal of Computing Sciences in Colleges, Jun 1, 2013
ABSTRACT
This work presents the first known implementation of elliptic curve cryptography for sensor netwo... more This work presents the first known implementation of elliptic curve cryptography for sensor networks, motivated by those networks' need for an efficient, secure mechanism for shared cryptographic keys' distribution and redistribution among nodes. Through instrumentation of UC Berkeley's TinyOS, this work demonstrates that secret-key cryptography is already viable on the MICA2 mote. Through analyses of another's implementation of modular exponentiation and of its own implementation of elliptic curves, this work concludes that public-key infrastructure may also be tractable in 4 kilobytes of primary memory on this 8-bit, 7.3828-MHz device. 4 aspires, in turn, to mitigate those weaknesses with its own implementation of Diffie-Hellman, based on the Elliptic Curve Discrete Logarithm Problem (ECDLP), and an analysis thereof. Section 5 proposes directions for future work, while Section 6 explores related work. Section 7 concludes. 2 SKIPJACK and the MICA2 TinyOS currently offers the MICA2 access control, authentication, integrity, and confidentiality through TinySec, a link-layer security mechanism based on SKIPJACK in CBC mode. An 80bit symmetric cipher, SKIPJACK is the formerly classified algorithm behind the Clipper chip, approved by the National Institute for Standards and Technology (NIST) in 1994 for the Escrowed Encryption Standard [50]. Through use of a shared, group key does TinySec provide for access control; with message authentication codes does it provide for messages' authentication and integrity; and with encryption does it provide for confidentiality. Unfortunately, TinySec's reliance on shared keys render the mechanism particularly vulnerable to attack. After all, the MICA2 is intended for deployment in sensor networks. For reasons of cost and logistics, long-term physical security of the devices is unlikely. Compromise of the network, therefore, reduces to compromise of any one node.
Springer eBooks, 2006
This paper describes the Advanced Forensic Format (AFF), which is designed as an alternative to c... more This paper describes the Advanced Forensic Format (AFF), which is designed as an alternative to current proprietary disk image formats. AFF offers two significant benefits. First, it is more flexible because it allows extensive metadata to be stored with images. Second, AFF images consume less disk space than images in other formats [e.g., EnCase images). This paper also describes the Advanced Disk Imager, a new program for acquiring disk images that compares favorably with existing alternatives.
The speed of today's worms demands automated detection, but the risk of false positives poses a d... more The speed of today's worms demands automated detection, but the risk of false positives poses a difficult problem. In prior work, we proposed a host-based intrusion-detection system for worms that leveraged collaboration among peers to lower its risk of false positives, and we simulated this approach for a system with two peers. In this paper, we build upon that work and evaluate our ideas "in the wild." We implement Wormboy 2.0, a prototype of our vision that allows us to quantify and compare worms' and non-worms' temporal consistency, similarity over time in worms' and non-worms' invocations of system calls. We deploy our prototype to a network of 30 hosts running Windows XP with Service Pack 2 to monitor and analyze 10,776 processes, inclusive of 511 unique non-worms (873 if we consider unique versions to be unique non-worms). We identify properties with which we can distinguish non-worms from worms 99% of the time. We find that our collaborative architecture, using patterns of system calls and simple heuristics, can detect worms running on multiple peers. And we find that collaboration among peers significantly reduces our probability of false positives because of the unlikely appearance on many peers simultaneously of non-worm processes with worm-like properties.
In recent months have teachers become publishers of content and students subscribers thereof by w... more In recent months have teachers become publishers of content and students subscribers thereof by way of podcasts, feeds of audio, video, and other content that can be downloaded to clients like iTunes and devices like iPods. In the fall of 2005, we ourselves began to podcast Harvard Extension School's Computer Science E-1 in both audio and video formats, the first course within Harvard University to do so. Our goals were to provide students with more portable access to educational content and to involve them in technology itself. To evaluate this experiment, we have analyzed logs and surveys of students. We find that our students valued E-1's podcast more as a vehicle for review (45%) than as an alternative to attendance (18%). We also find that most students (71%) tended to listen to or watch lectures on their computers, with far fewer relying upon audio-only (19%) or video (10%) iPods. We argue, meanwhile, that podcasting, despite its widespread popularity, is but a marginal improvement on trends long in progress. It is this technology's reach that we claim is significant, not the technology itself. Logs suggest that E-1's own podcast, available not only to students but to the public at large, has acquired (as of September 2006) between 6,000 and 10,000 subscribers from over 50 countries. We argue, then, that podcasting offers to extend universities' educational reach more than it offers to improve education itself.
ACM Transactions on Sensor Networks, Aug 1, 2008
We present a critical evaluation of the first known implementation of elliptic curve cryptography... more We present a critical evaluation of the first known implementation of elliptic curve cryptography over F 2 p for sensor networks based on the 8-bit, 7.3828-MHz MICA2 mote. We offer, along the way, a primer for those interested in the field of cryptography for sensor networks. We discuss, in particular, the decisions underlying our design and alternatives thereto. And we elaborate on the methodologies underlying our evaluation. Through instrumentation of UC Berkeley's TinySec module, we argue that, although symmetric cryptography has been tractable in this domain for some time, there has remained a need, unfulfilled until recently, for an efficient, secure mechanism for distribution of secret keys among nodes. Although public-key infrastructure has been thought impractical, we show, through analysis of our original implementation for TinyOS of point multiplication on elliptic curves, that public-key infrastructure is indeed viable for TinySec keys' distribution, even on the MICA2. We demonstrate that public keys can be generated within 34 seconds and that shared secrets can be distributed among nodes in a sensor network within the same time, using just over 1 kilobyte of SRAM and 34 kilobytes of ROM. We demonstrate that communication costs are minimal, with only 2 packets required for transmission of a public key among nodes. We make available all of our source code for other researchers to download and use. And we discuss recent results based on our work that corroborate and improve upon our conclusions.
I worry over topics for the syllabus, fretting over demos and presentations. And yet, I always co... more I worry over topics for the syllabus, fretting over demos and presentations. And yet, I always come back to the fact that most of what my students learn and remember from my course comes from the assignments. Great assignments are hard to dream up and time-consuming to develop. With that in mind, the Nifty Assignments session is all about promoting
Springer eBooks, 2006
Many of today's privacy-preserving tools create a big file that fills up a hard drive or USB stor... more Many of today's privacy-preserving tools create a big file that fills up a hard drive or USB storage device in an effort to overwrite all of the "deleted files" that the media contain. But while this technique is widespread, it is largely unvalidated. We evaluate the effectiveness of the "big file technique" using sectorby-sector disk imaging on file systems running under Windows, Mac OS, Linux, and FreeBSD. We find the big file is effective in overwriting file data on FAT32, NTFS, and HFS, but not on Ext2fs, Ext3fs, or Reiserfs. In one case, as much 60MB of a 488MB device was not overwritten with the technique. Also, file metadata such as filenames are rarely overwritten. We present a theoretical analysis of the file sanitization problem and evaluate the effectiveness of a commercial implementation that implements an improved strategy. 9 e.g., dd if=/dev/zero of=volume.iso 10 e.g., mdconfig-at vnode-f volume.iso-u 0; newfs /dev/md0 11 e.g., cp-pR /volume1 /volume2
Journal of Computer Assisted Learning, Feb 27, 2020
Massive open online course (MOOC) studies have shown that precourse skills (such as precomputatio... more Massive open online course (MOOC) studies have shown that precourse skills (such as precomputational thinking) and course engagement measures (such as making multiple submission attempts with assignments when the initial submission is incorrect) predict students' grade performance, yet little is known about whether these factors predict students' course retention. In applying survival analysis to a sample of more than 20,000 participants from one popular computer science MOOC, we found that students' precomputational thinking skills and their perseverance in assignment submission strongly predict their persistence in the MOOC. Moreover, we discovered that precomputational thinking skills, programming experience, and gender, which were previously considered to be constant predictors of students' retention, have effects that attenuate over the course milestones. This finding suggests that MOOC educators should take a growth perspective towards students' persistence: As students overcome the initial hurdles, their resilience grows stronger. 1 | INTRODUCTION The massive open online course (MOOC) was formally introduced to the internet in 2011 (Ng & Widom, 2012). By the year 2017, more than 9,000 MOOCs have come into existence, hosted by more than 800 higher education institutions, serving more than 80 million learners (Shah, 2018). MOOCs have no entry requirements and are easy to access (Kop, 2011; Lee, 2017), have huge numbers of participants (Cohen & Soffer, 2015; Sharples et al., 2012), often partner with prestigious higher educational institutions (Cusumano, 2014), and charge a low or no fee for a wide range of materials, such as lecture videos, online discussion forums, and assessments (Thompson, 2011).
Distance Education, Jan 2, 2020
Participants' engagement in massive online open courses (MOOCs) is highly irregular and self-dire... more Participants' engagement in massive online open courses (MOOCs) is highly irregular and self-directed. It is well known in the field of television media that substantial parts of the audience tend to drop out at major episodic, or seasonal, closures, which makes creating cliff-hangers a crucial strategy to retain viewers (Bakker, 1993; Cazani, 2016; Thompson, 2003). Could there be an analogous pattern in MOOCs-with an elevated probability of dropout at major chapter transitions? Applying disjoint survival analysis on a sample of 12,913 students in a popular astronomy MOOC that built participants' cultural capital (hobbyist pursuits), we found a significant increase in dropout rates at chapter closures. Moreover, the latter the chapter closure was positioned in the course sequence, the higher the dropout rate became. We found this pattern replicated in a sample of 20,134 students in a popular computer science MOOC that introduced participants to programming.
SIGCSE is packed with teaching insights and inspiration. However, we get these insights and inspi... more SIGCSE is packed with teaching insights and inspiration. However, we get these insights and inspiration from hearing our colleagues talk about their teaching. Why not just watch them teach? This session does exactly that. Six exceptional educators will present their favorite piece of innovative lecture content just as they would to their students. The moderator, Colleen Lewis, will describe the central pedagogical move within the innovation and how this connects to education research. The goal of the session is to inspire SIGCSE attendees by highlighting innovative instruction by exceptional educators. The specific content of the innovative instruction may be applicable for some attendees, and the discussion of the underlying pedagogical move within each innovation can be applied across the attendees' teaching. CCS CONCEPTS • Social and professional topics → Computing education.
SIGCSE bulletin, Mar 7, 2007
Scratch is a "media-rich programming environment" recently developed by MIT's Media Lab that "let... more Scratch is a "media-rich programming environment" recently developed by MIT's Media Lab that "lets you create your own animations, games, and interactive art." Although Scratch is intended to "enhance the development of technological fluency [among youths] at after-school centers in economically disadvantaged communities," we find remarkable potential in this programming environment for higher education as well. We propose Scratch as a first language for first-time programmers in introductory courses, for majors and non-majors alike. Scratch allows students to program with a mouse: programmatic constructs are represented as puzzle pieces that only fit together if "syntactically" appropriate. We argue that this environment allows students not only to master programmatic constructs before syntax but also to focus on problems of logic before syntax. We view Scratch as a gateway to languages like Java. To validate our proposal, we recently deployed Scratch for the first time in higher education via Harvard Summer School's Computer Science S-1: Great Ideas in Computer Science, the summertime version of a course at Harvard College. Our goal was not to improve scores but instead to improve first-time programmers' experiences. We ultimately transitioned to Java, but we first introduced programming itself via Scratch. We present in this paper the results of our trial. We find that, not only did Scratch excite students at a critical time (i.e., their first foray into computer science), it also familiarized the inexperienced among them with fundamentals of programming without the distraction of syntax. Moreover, when asked via surveys at term's end to reflect on how their initial experience with Scratch affected their subsequent experience with Java, most students (76%) felt that Scratch was a positive influence, particularly those without prior background. Those students (16%) who felt that Scratch was not an influence, positive or negative, all had prior programming experience.
Uploads
Papers by David J . Malan