Skip to content

CU Colloquia

List of Front Range Software Networking Events and Conferences

Updated March 21: crossed out ‘conferences’ because I don’t do a good job of listing those.
Boulder, Colorado, has a great tech scene, that I’ve been a peripheral member of for a while now.  I thought I’d share a few of the places I go to network.  And by “network”, I mean learn about cool new technologies, get a feel for the state of the scene (are companies hiring?  Firing?  What technologies are in high demand?) and chat with interesting people.  All of the events below focus on software, except where noted.

NB: I have not found work through any of these events.  But if I needed work, these communities are the second place I’d look.  (The first place would be my personal network.)

Boulder Denver New Tech Meetup

  • 5 minute presentions.  Two times a month.  Audience varies wildly from hard core developers to marketing folks to graphic designers to upper level execs.  Focus is on new technologies and companies.  Arrive early, because once the presentations start, it’s hard to talk to people.
  • Good for: energy, free food, broad overviews, regular meetings, reminding you of the glory days in 1999.
  • Bad for: diving deep into a subject, expanding your technical knowledge

User groups: Boulder Java Users Group, Boulder Linux Users Group, Rocky Mountain Adobe Users Group, Denver/Boulder Drupal Users Group, Denver Java Users Group others updated 11/12 8:51: added Denver JUG

  • Typically one or two presentations each meeting, for an hour or two.  Tend to focus on a specific technology, as indicated by the names.  Sometimes food is provided.
  • Good for: diving deep into a technology, networking amongst fellow nerds, regular meetings
  • Bad for: anyone not interested in what they’re presenting that night, non technical folks

Meetups (of which BDNT, covered above, is one)

  • There’s a meetup for everything under the sun.  Well, almost.  If you’re looking to focus on a particular subject, consider starting one (not free) or joining one–typically free.
  • Good for: breadth of possibility–you want to talk about Google?  How about SecondLife?
  • Bad for: many are kind of small

Startup Drinks

  • Get together in a bar and mingle. Talk about your startups dreams or realities.
  • Good: have a beer, talk tech–what’s not to like?, takes place after working hours, casual
  • Bad: hard to target who to talk to, intermittent, takes place after working hours.

BarCamp

  • Originally started, I believe, in response to FooCamp, this is an unconference. On Friday attendees get together and assemble an interim conference schedule.  On Saturday, they present, in about an hour or so.  Some slots are group activities (“let’s talk about technology X”) rather than presentations.  Very free form.
  • Good: for meeting people interested in technologies, can be relatively deep introduction to a technology
  • Bad: if you need lots of structure, if you want a goodie bag from a conference, presentations can be uneven in quality, hasn’t been one in a while around here (that I know of)

Ignite

  • Presentations on a variety of topics, some geeky, some not.  Presentations determined by vote.  Presentations are 20 slide and 5 minutes total.  Costs something (~$10).
  • Good: happens in several cities (Denver, Boulder, Fort Collins) so gives you chance to meet folks in your community, presentations tend to be funny, wide range of audience
  • Bad: skim surface of topic, presentation quality can vary significantly, not a lot of time to talk to people as you’re mostly watching presentations

CU Computer Science colloquia

  • Run by the CU CS department, these are technical presentations.  Usually given by a visiting PhD.
  • Good: Good to see what is coming down the pike, deep exposure to topics you might never think about (“Effective and Ubiquitous Access for Blind People”, “Optimal-Rate Routing in Adversarial Networks”)
  • Bad: The ones I’ve been to had no professionals there that I could see, happen during the middle of the work day, deep exposure to topics you might not care about

Jelly

  • Cooperative work environments, hosted at a coffee shop or location.
  • Good: informal, could be plenty of time to talk to peers
  • Bad: not sure I’ve ever heard of one happening on the front range, not that different from going to your local coffee shop

Boulder Open Coffee Club

  • From the website: it “encourage entrepreneurs, developers and investors to organize real-world informal meetups”.  I don’t have enough data to give you good/bad points.

Startup Weekend

  • BarCamp with a focus–build a startup company.  With whoever shows up.
  • Good: focus, interesting people, you know they’re entrepeneurial to give a up a weekend to attend, broad cross section of skills
  • Bad: you give up a weekend to attend

Refresh Denver

  • Another group that leverages meetup.com, these folks are in Denver.  Focus on web developers and designers.  Again, I don’t have enough to give good/bad points.

Except for Ignite, everything above is free or donation-based.  The paid conferences around Colorado that I know about, I’ll cover in a future post.

What am I missing?  I know the list is skewed towards Boulder–I haven’t really been to conferences more than an hours drive from Boulder.

Do you use these events as a chance to network?  Catch up with friends?  Learn about new technologies, processes and companies?

Finding bugs from software history: a talk at CU I attended

I went to an absolutely fascinating talk today at CU. Sunghun Kim gave a talk titled “Predicting Bugs by Analyzing Software History”. The basic premise is you can look at historical unstructured information, including emails, bug reports, check in comments, and if you can identify bugs that were related to that unstructured information, you can use that to find other bugs.

He talked about two different methods to ‘find other bugs’. The first is change classification. Based on a large number of factors, including attributes of the program text, complexity metrics and source control management meta data like time of checkin (don’t check code in on Friday!) and committing developer, he was able to identify whether or not a bug was introduced for a given checkin. (A question was asked about looking at changes at the token level, and he said that would be an interesting place for further research.) This was 94% precise (if the system said a bug was introduced, there was a 94% chance it was) and had 70% recall (it missed 30% of real bugs introduced, but got 70% of them).

They [he collaborates a lot] were able to judge future changes probabilities of introducing a bug by feeding all the attributes I mention above on known bugs to a machine learning program. Kim said there were ways of automating some of the collection of this data, but I can imagine that the quality of bug prediction depends massively on the ability to find which bugs were introduced when, and tie known bugs to those attributes. The numbers I quote above were based on research from a number of open source projects like Apache and Mozilla, and varied quite a bit. Someone asked about the difference of numbers, and he said that the habit of commit activity was a large cause of that variation. Basically, the more targeted commits were (one file instead of five, etc), the higher precision could be attained. Kim also mentioned that this type of change classification is similar to using a GPS for directions around a city–the more unfamiliar you are with the code, the more useful it would be. He mentioned that Apple and Yahoo! were using this system in their software development.

The other interesting concept he talked about was a bug cache. If you’ve developed for any length of time on a given project, you know there are places developers fear to tread. Whether it is complicated logic, fragile interfaces with legacy systems or just plain fugly code, there are sections of any codebase where change is a bit scary. Kim talked about the Windows Server 2003 team maintaining a list of such modules, so that anytime anyone changed something on that list, more review than normal would take place. This list is what he’s trying to repeat in an automated fashion.

If you place files in a cache when they are identified as having a bug, and also place other files that are close in checkin time to that file, you can build a cache of files to closely review. After about 50-100 files for the 200 file Apache project, that cache of 20 files (10%) contained a significant portion of future bugs. Across several open source projects, the range of bugs contained in the cache was 73-95%. He also talked about using this on the method level as opposed to the file level.

In both these cases, machine learning that happens on one project is not very useful for others. (When they did an analysis of the Mozilla codebase and then turned it on the Eclipse codebase, it wasn’t good at finding bugs). Kim speculated that this was due to project and personal coding styles (some people are from Mars, others write buffer overflow bugs), as the Apache 1.3 trained machine was OK at finding bugs in the Apache 2.0 codebase.

Kim talked about several other interesting projects that he has been part of, including the ‘Memory of Similar Bugs’, which found that up to 40% of bugs are recurring patterns, and ReCrash, a probe that monitors an application for crash conditions, and, when it finds one, automatically writes a unit test that can reproduce the crash situation. How cool is that? The cost, of course, is ReCrash imposing high overhead (13-64% increase) as a cost of monitoring.

This was so fascinating to me because anything we can do to automate the bug finding process lets us build better software. There are always data input problems (GIGO still rules), but it seemed like Kim had found some ways around that, especially when the developers were good about comments on checkin. I’m all for spending more time building cool features and better business logic. All in all, a great talk about an interesting topic.

[tags]the bug in the machine, commit early and often[/tags]

MolhadoRef: A refactoring-aware version control system

This past Thursday, I went to one of the CU Comp Sci Colloquia. The speaker was Danny Dig, and he discussed Refactoring-Aware Software Merging and Version Control for Object-Oriented Programs, in particular, MolhadoRef. Mr Dig billed this several times as the first refactoring aware source control system ‘that we are aware of’.

Being relatively new to IDEs in general and Eclipse in particular, I find that the automatic refactoring is one of the largest productivity enhancements that I’ve discovered. The ability to almost effortlessly change a variable name to more accurately reflect its purpose, and to know that such a change will propagate everywhere it is used, lets you update variable names more often. I was recently editing a piece of PHP code and one of the variables had changed slightly in purpose. I could have done a search and replace, but I wouldn’t be positive that the changes hadn’t broken something subtle. With Eclipse, there’s no such hesitation.

This is almost worth the cost of having to use the mouse and not being able to use vi keystrokes (at least, without the vi plugin). But when you throw in the other refactorings, including move method and rename method, well, refactoring becomes invaluable.

What follows are my notes from the talk.

Mr Dig discussed how refactoring is becoming more prevalent in todays development environments, and how this is challenging text based version control systems such as CVS and SVN. The main issues are that refactoring causes global changes, while version control systems excel at tracking local changes. In addition, refactoring impacts program elements, while version control systems focus on files. In short, refactorings are more likely than normal edits to create conflicts and/or merge errors–due to refactoring tools, changes are regularly no longer limited to files.

In addition, when you move members between classes, you can lose version history–the only connection back to the original location is a commit comment, which is neither dependable nor robust.

MolhadoRef aims to change this, by making the version control system become refactoring aware. Currently MolhadoRef exists in ‘research project’ form as an Eclipse plugin, and is part of Mr Dig’s PhD thesis work. (By ‘research form’, I think he means not ready for prime time. For example, in his demo, when you checked changes in, there was no support for a commit message.) There are plans to release it as an open source project, but it is currently not so available. MolhadoRef is based on Molhado, which is an SCM system with support for OO software, on the backend, and Eclipse and a refactoring engine on the front end.

Basically, MolhadoRef can handle API level merging, where ‘API level’ means any member declarations. It also supports normal text edits in the same manner as CVS or SVN. It also tracks compile time as well as runtime merge conflicts.

On checkin, it follows the following algorithm:

  1. Detect API change operations, using a 3 way comparison. They are also tracked with tools such as Catchup or RefactoringCrawler
  2. Detect and resolve conflicts. It attempts to do so in an automated fashion, but sometimes requires user input.
  3. Undo refactorings. Sometimes this is not entirely possible, but if it fails, MolhadoRef is intelligent enough to fall back to a text merge.
  4. Merge text changes. This happens on a method level, not on a file level.
  5. Replay refactorings. This can be hard, because you need to serialize what may have been two parallel series of refactorings. He went into some things I didn’t understand right here, including using graph theory to reorder the refactoring, or making a refactoring into an enhanced refactoring (which not only changes code, but future refactorings in the chain).

Mr Dig conducted a case study and an experiment to prove the effectiveness of MolhadoRef. In the case study, he and a partner developed MolhadoRef separately for 3 weeks, and then merged the changes using both MolhadoRef and CVS. With 1267 lines of changed code, CVS had 36 conflicts, 41 compile errors, 7 runtime errors (determined by unit test coverage), and it took 105 minutes (including human time) to merge the two trees. In contrast, MolhadoRef had 1 conflict, 0 compile and runtime errors, and it took 1 minute to merge. Pretty impressive.

The experiment was to split 10 students (fairly experienced with Eclipse and java) into two groups. Each group had a common task. At the end, the changes made by each person in the first group were merged with the changes made by each person in the second group, producing 25 merge sessions. Using these episodes, using MolhadoRef led to 3.6 times fewer conflicts, 11.6 times fewer compiler errors, and 1.5 fewer runtime errors. In addition, the time to merge was 3.5 times faster with MolhadoRef.

The larger picture that Mr Dig is pursuing is to perform automated upgrades of component based applications (pdf). If you’ve ever upgraded from one version of a java web framework to another, you’ll know how painful it can be, especially if the API has changed. According to Mr Dig, 80% of the API changes in 5 projects he’s examined so far (4 open source, 1 proprietary) were due to refactorings. He told us why MolhadoRef fits into this picture, but I didn’t note it. Bummer.

He did say that both Microsoft and IBM were interested in his work. He also mentioned, I think, that Eclipse has already accepted some of his refactoring code.

/Notes

What does that mean for us developers who don’t use build version control systems? well, not much right now, but if some of the bigger providers start to use this, it means that bigger, longer lived projects can be better managed; that code history won’t be lost as often, and that merge conflicts will decrease in severity and number. I for one can’t wait for that day.

[tags]refactoring, version control systems[/tags]

Notes from a talk about DiamondTouch

I went to another University of Colorado computer science colloquium last week, covering Selected HCI Research at MERL Technology Laboratory. I’ve blogged about some of the talks I’ve attended in the past.

This talk was largely about the DiamondTouch, but an overview of Mitsubishi Electronic Research Laboratories was also given. The DiamondTouch is essentially a tablet PC writ large–you interact through a touch screen. The biggest twist is that the touch screen can actually differentiate users, based on electrical impulses (you sit on special pads which, I believe, generate the needed electrical signatures). To see the DiamondTouch in action, check out this YouTube movie showing a user playing World Of Warcraft on a DiamondTouch. (For more on YouTube licensing, check out the latest Cringely column.)

What follows are my loosely edited notes from the talk by Kent Wittenburg and Kathy Ryall.

[notes]

First, from Kent Wittenburg, one of the directors of the lab:

MERL is a research lab. They don’t do pure research–each year they have a numeric goal of business impacts. Such impacts can be a standards contribution, a product, or a feature in a product. They are associated with Mitsubishi Electric (not the car company).

Five areas of focus:

  • Computer vision–2D/3D face detection, object tracking
  • Sensor and data–indoor networks, audio classification
  • Digital Communication–UWB, mesh networking, ZigBee
  • Digital Video–MPEG encoding, highlights detection, H.264. Interesting anecdote–realtime video processing is hard, but audio processing can be easier, so they used audio processing to find highlights (GOAL!!!!!!!!!!!!) in sporting videos. This technology is currently in a product distributed in Japan.
  • Off the Desktop technologies–touch, speech, multiple display calibration, font technologies (some included in Flash 8 ), spoken queries

The lab tends to have a range of time lines–37% long term, 47% medium and 16% short term. I think that “long term” is greater than 5 years, and “short term” is less than 2 years, but I’m not positive.

Next, from Kathy Ryall, who declared she was a software person, and was focusing on the DiamondTouch technology.

The DiamondTouch is multiple user, multi touch, and can distinguish users. You can touch with different fingers. The screen is debris tolerant–you can set things on it, or spill stuff on it and it continues to work. The DiamondTouch has legacy support, where hand gestures and pokes are interpreted as mouse clicks. The folks at MERL (and other places) are still working on interaction standards for the screen. The DiamondTouch has a richer interaction than the mouse, because you can use multi finger gestures and pen and finger (multi device) interaction. It’s a whole new user interface, especially when you consider that there are multiple users touching it at one time–it can be used as a shared communal space; you can pass documents around with hand gestures, etc.

It is a USB device that should just plug in and work. There are commercial developer kits available. These are available in C++, C, Java, Active X. There’s also a Flash library for creating rapid prototype applications. DiamondSpin is an open source java interface to some of the DiamondTouch capabilities. The folks at MERL are also involved right now in wrapping other APIs for the DiamondTouch.

There are two sizes of DiamondTouch–81 and 107 (I think those are the diagonal measurements). One of these tables costs around $10,000, so it seems limited to large companies and universities for a while. MERL is also working on DiamondSpace, which extends the DiamondTouch technology to walls, laptops, etc.

[end of notes]

It’s a fascinating technology–I’m not quite sure how I’d use it as a PC replacement, but I could see it (once the cost is more reasonable) but I could see it as a bulletin board replacement. Applications that might benefit from multiple user interaction and a larger screen (larger in size, but not in resolution, I believe), like drafting and gaming, would be natural for this technology too.

CU Talk: Supporting the Software Revolution

Last Thursday, I went to a talk (one of the CU CS Colloquia) about software and the problems it faces today at the University of Colorado called “Supporting the Software Revolution”. Amer Diwan gave a talk about some of his research and how it deals with modern software programs, which are becoming larger and larger, with the following ramifications:

they are
1. harder to write–more collaborative.
2. harder to understand
3. more resource intensive

He talked about some of his research in the educational sphere, where he was working against the traditional engineering bias against collaboration by training students to work in teams. Amer also mentioned his research into tools to help discover documentation to increase understanding of large programs. But the meat of his research, as well as the focus of this talk, was on technologies to improve performance, including hardware aware software and visualization.

Amer primarily discussed vertical profiling, which stated that because of the multilayered nature of todays applications (application on top of framework on top of virtual machine on top of hardware) it is not enough to simply profile the application and the hardware, since each level can interact with each other level in non intuitive ways.

The answer is vertical profiling, where you instrument each layer appropriately. (Some layers, such as the jikes JVM, are pre-instrumented.) Find a target metric, like instructions per cycle. Instrument different all the different metrics (for example, new object allocations is one thing could be instrumented for the virtual machine level). Then, align all these metrics with one common metric to combat nondeterministic behavior.

(This is where I get a bit fuzzy–I believe that they used some kind of algorithm to match up the instructions per cycle with other interesting metrics. He mentioned some kind of algorithm that had previously been used for speech recognition. Not sure how to align three variables [time, instructions per cycle, and one other interesting metric] on one chart.)

Then, after all the metrics have been aligned in time, look for interesting and disturbing patterns–this is where the humans come in and rank the graphs by similarity. Then see if one metric depends on another–you can then discard the dependent metric, since you’re looking for the root issue. After you think you have found the root issue, make a change to that, and profile the stack again. If the pattern of interest is gone, you are validated and have addressed the root issue–otherwise back to the drawing board.

This was interesting because of the alignment in time of different metrics (which is, unfortunately, the piece I understand and remember the least). Other than that, it was pretty much a well thought out and methodical explication of common knowledge of profiling. Change one thing at a time, look for dependencies, validate your suppositions, be aware of interactions between the layers of your application. It is nice to see someone trying to turn the black art of performance tuning a bit more into a science.

So, if you’re ever on a project that has the time and budget to deal with performance issues in a methodical manner, vertical profiling is worth a look. More here:

Performance explorer: understanding java application behavior

One of Diwan’s student’s research pages

RIFLE: User Centric Information Flow Security

I went to a talk yesterday about RIFLE: An Architectural Framework for User-Centric Information-Flow Security, one of a series of University of Colorado CS Colloquia. “User-Centric Information-Flow Security” (UCIFS) is a different way of enforcing security than almost anything I’ve encountered before. Basically, instead of assigning permissions to users and actions, a la JAAS, you assign levels of security to data. This security level is then tracked throughout the application, and at various endpoints (I/O activity, network transmission) a policy is enforced. Therefore, you could tag a SSN with a high security level, and any variables and decisions based on the SSN would be tagged similarly, since security levels propagate. Then, when some piece of malware tries to send your SSN (or anything related to it) off to Russia, the system intervenes.

I say UCIFS is a “different way of enforcing security than almost anything I’ve encountered” above because there’s one thing that I’ve seen that does assign a security level to some kinds of data: perl’s taint mode. I’ve used taint mode in perl cgi scripts before, and it’s a good way to make sure that untrusted data is not used in dangerous situations without the programmer’s explicit knowledge.

However, UCIFS aims a bit higher. An ideal system tracks data and its levels through all algorithms, doesn’t leak data, requires no effort from a programmer and enforces policies dynamically. According to the presenter, it turns out that no system can have zero data leakage. You can always signal the state of a variable in some way, even if it’s as crude as ceasing the operation of the program–these are called ‘covert channels’. RIFLE meets the other criteria, apparently, and does so by operating on binaries and tracking the data via extra registers (I’m on thin ice here, since I’m by no means a systems programmer).

It was an interesting talk because tracking security based on data, and giving users choices for data security, sure seems a better way of dealing with security issues than the program level trust that firewalls and ACLs now provide. Not a whole lot of real world applicability just yet (creating policies was barely touched upon, for one thing), but perhaps in the future. For more, please check out the Liberty Research Group’s website.