This is intended to be a more theoretical discussion of 'when to break up files', from my aged and crusty standpoint. Others may have different views and opinions, but I've found this to be a good description of how I've found my optimized break points and set thresholds for breaking up files.
The basic concept that I believe is most critical to code (and other project) organization boils down to overall search and retrieval time. My ability to find things gates nearly everything I do - how quickly can I find an interface, how quickly can I find a header, how quickly can I find the function that X calls. Because of this, I pretty much optimize my directories, files, and even file internals for manual search time.
When you can find something quickly:
1) you forget less of what you were actually doing
2) you're less likely to get distracted by other stuff
3) if it's quick enough, your tools can act as level 1 or 2 cache for your brain
The problem is that "finding something quickly" is not at all trivial. Let's take directory/file organization as an example:
- If you have too few directories, you'll have so much file spam that you can't find anything without actually looking for it by name.
- If you have too many directories, you'll also have so much directory spam that you can't find anything - but this is worse than having too many files, because now you have to look by name one at a time in the various subtrees.
Let's look at high speed search routines used in the real world for a moment. In rough order of complexity, here's a list of some commonly used lookup structures:
- sparse hash table - O(1)
- sparse radix tables - O(log(N)/log(r))
- binary tree - O(log(N)/log(2))
- list search - O(N)
As is commonly known, the performance difference between these is related to the memory overhead allocated for search - hash tables and radix tables far and away being the fastest, but also consuming the most memory.
This directly relates to your brain when searching for things in your directory structure. You can hold a given number of things in your head for quick access based on the number of neurons you've got and how they're organized. This number of things you can easily manage should, IMHO, be the largest factor in deciding where to set organizational breakpoints.
One common problem I've seen with this is that smart people generally decide organization, and smart people are more rare than less smart people. If you're really smart, but you're working with people who are less smart, it's really easy to set the organizational bar too high for everyone else.
The companies I have worked for professionally have all set the bar higher than I would have preferred - not because it hindered me or other high fliers, but because it hindered our support staff and lowered everyone's net productivity. If you're working with multiple people, this is a critically important thing to keep in mind.
Another downside to setting the organizational bar high is that while you might be able to handle it right now, after five years, you probably can't. That directory and file structure that was so easy to navigate when you were neck deep in it suddenly looks foreign after you've been writing gui libraries for three years.
Keep in mind that search time is the average search time, not the minimum search time. It's ok if some stuff takes forever to find - if you only access it once every four years. The stuff that's used most often should always be at hand. Also remember that what's used often today probably won't be tomorrow.
In short, this is a big, ugly optimization problem with no simple, easy, or obvious answer. Everyone is going to have different breakpoints and different ways in which they run into the limitations of their system. In my opinion, it's worth spending at least a few percent of your overall time, on a continuous basis, examining your current setup; changes in organization often have extremely high leveraging factors.
-dentin
Alter Aeon MUD
http://www.alteraeon.com