Subversion Repositories DevTools

Rev

Rev 227 | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
227 dpurdie 1
#!/bin/sh
2
exec perl -w -x $0 ${1+"$@"} # -*- mode: perl; perl-indent-level: 2; -*-
3
#!perl -w
4
 
5
##############################################################
6
###                                                        ###
7
### cvs2cl.pl: produce ChangeLog(s) from `cvs log` output. ###
8
###                                                        ###
9
##############################################################
10
 
11
## $Revision: 1.1 $
12
## $Date: 2003/07/03 15:15:03 $
13
## $Author: ayoung $
14
##
15
##   (C) 1999 Karl Fogel <kfogel@red-bean.com>, under the GNU GPL.
16
## 
17
##   (Extensively hacked on by Melissa O'Neill <oneill@cs.sfu.ca>.)
18
##
19
## cvs2cl.pl is free software; you can redistribute it and/or modify
20
## it under the terms of the GNU General Public License as published by
21
## the Free Software Foundation; either version 2, or (at your option)
22
## any later version.
23
##
24
## cvs2cl.pl is distributed in the hope that it will be useful,
25
## but WITHOUT ANY WARRANTY; without even the implied warranty of
26
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
27
## GNU General Public License for more details.
28
##
29
## You may have received a copy of the GNU General Public License
30
## along with cvs2cl.pl; see the file COPYING.  If not, write to the
31
## Free Software Foundation, Inc., 59 Temple Place - Suite 330,
32
## Boston, MA 02111-1307, USA.
33
 
34
 
35
 
36
use strict;
37
use Text::Wrap;
38
use Time::Local;
39
use File::Basename;
40
 
41
 
42
# The Plan:
43
#
44
# Read in the logs for multiple files, spit out a nice ChangeLog that
45
# mirrors the information entered during `cvs commit'.
46
#
47
# The problem presents some challenges. In an ideal world, we could
48
# detect files with the same author, log message, and checkin time --
49
# each <filelist, author, time, logmessage> would be a changelog entry.
50
# We'd sort them; and spit them out.  Unfortunately, CVS is *not atomic*
51
# so checkins can span a range of times.  Also, the directory structure
52
# could be hierarchical.
53
#
54
# Another question is whether we really want to have the ChangeLog
55
# exactly reflect commits. An author could issue two related commits,
56
# with different log entries, reflecting a single logical change to the
57
# source. GNU style ChangeLogs group these under a single author/date.
58
# We try to do the same.
59
#
60
# So, we parse the output of `cvs log', storing log messages in a
61
# multilevel hash that stores the mapping:
62
#   directory => author => time => message => filelist
63
# As we go, we notice "nearby" commit times and store them together
64
# (i.e., under the same timestamp), so they appear in the same log
65
# entry.
66
#
67
# When we've read all the logs, we twist this mapping into
68
# a time => author => message => filelist mapping for each directory.
69
#
70
# If we're not using the `--distributed' flag, the directory is always
71
# considered to be `./', even as descend into subdirectories.
72
 
73
 
74
############### Globals ################
75
 
76
 
77
# What we run to generate it:
78
my $Log_Source_Command = "cvs log";
79
 
80
# In case we have to print it out:
81
my $VERSION = '$Revision: 1.1 $';
82
$VERSION =~ s/\S+\s+(\S+)\s+\S+/$1/;
83
 
84
## Vars set by options:
85
 
86
# Print debugging messages?
87
my $Debug = 0;
88
 
89
# Just show version and exit?
90
my $Print_Version = 0;
91
 
92
# Just print usage message and exit?
93
my $Print_Usage = 0;
94
 
95
# Single top-level ChangeLog, or one per subdirectory?
96
my $Distributed = 0;
97
 
98
# What file should we generate (defaults to "ChangeLog")?
99
my $Log_File_Name = "ChangeLog";
100
 
101
# Grab most recent entry date from existing ChangeLog file, just add
102
# to that ChangeLog.
103
my $Cumulative = 0;
104
 
105
# Expand usernames to email addresses based on a map file?
106
my $User_Map_File = "";
107
 
108
# Output to a file or to stdout?
109
my $Output_To_Stdout = 0;
110
 
111
# Eliminate empty log messages?
112
my $Prune_Empty_Msgs = 0;
113
 
114
# Don't call Text::Wrap on the body of the message
115
my $No_Wrap = 0;
116
 
117
# Separates header from log message.  Code assumes it is either " " or
118
# "\n\n", so if there's ever an option to set it to something else,
119
# make sure to go through all conditionals that use this var.
120
my $After_Header = " ";
121
 
122
# Format more for programs than for humans.
123
my $XML_Output = 0;
124
 
125
# Do some special tweaks for log data that was written in FSF
126
# ChangeLog style.
127
my $FSF_Style = 0;
128
 
129
# Show times in UTC instead of local time
130
my $UTC_Times = 0;
131
 
132
# Show day of week in output?
133
my $Show_Day_Of_Week = 0;
134
 
135
# Show revision numbers in output?
136
my $Show_Revisions = 0;
137
 
138
# Show tags (symbolic names) in output?
139
my $Show_Tags = 0;
140
 
141
# Show branches by symbolic name in output?
142
my $Show_Branches = 0;
143
 
144
# Show only revisions on these branches or their ancestors.
145
my @Follow_Branches;
146
 
147
# Don't bother with files matching this regexp.
148
my @Ignore_Files;
149
 
150
# How exactly we match entries.  We definitely want "o",
151
# and user might add "i" by using --case-insensitive option.
152
my $Case_Insensitive = 0;
153
 
154
# Maybe only show log messages matching a certain regular expression.
155
my $Regexp_Gate = "";
156
 
157
# Pass this global option string along to cvs, to the left of `log':
158
my $Global_Opts = "";
159
 
160
# Pass this option string along to the cvs log subcommand:
161
my $Command_Opts = "";
162
 
163
# Read log output from stdin instead of invoking cvs log?
164
my $Input_From_Stdin = 0;
165
 
166
# Don't show filenames in output.
167
my $Hide_Filenames = 0;
168
 
169
# Max checkin duration. CVS checkin is not atomic, so we may have checkin
170
# times that span a range of time. We assume that checkins will last no
171
# longer than $Max_Checkin_Duration seconds, and that similarly, no
172
# checkins will happen from the same users with the same message less
173
# than $Max_Checkin_Duration seconds apart.
174
my $Max_Checkin_Duration = 180;
175
 
176
# What to put at the front of [each] ChangeLog.  
177
my $ChangeLog_Header = "";
178
 
179
## end vars set by options.
180
 
181
# In 'cvs log' output, one long unbroken line of equal signs separates
182
# files:
183
my $file_separator = "======================================="
184
                   . "======================================";
185
 
186
# In 'cvs log' output, a shorter line of dashes separates log messages
187
# within a file:
188
my $logmsg_separator = "----------------------------";
189
 
190
 
191
############### End globals ############
192
 
193
 
194
 
195
 
196
&parse_options ();
197
&derive_change_log ();
198
 
199
 
200
 
201
### Everything below is subroutine definitions. ###
202
 
203
# If accumulating, grab the boundary date from pre-existing ChangeLog.
204
sub maybe_grab_accumulation_date ()
205
{
206
  if (! $Cumulative) {
207
    return "";
208
  }
209
 
210
  # else
211
 
212
  open (LOG, "$Log_File_Name")
213
      or die ("trouble opening $Log_File_Name for reading ($!)");
214
 
215
  my $boundary_date;
216
  while (<LOG>)
217
  {
218
    if (/^(\d\d\d\d-\d\d-\d\d\s+\d\d:\d\d)/)
219
    {
220
      $boundary_date = "$1";
221
      last;
222
    }
223
  }
224
 
225
  close (LOG);
226
  return $boundary_date;
227
}
228
 
229
 
230
# Fills up a ChangeLog structure in the current directory.
231
sub derive_change_log ()
232
{
233
  # See "The Plan" above for a full explanation.
234
 
235
  my %grand_poobah;
236
 
237
  my $file_full_path;
238
  my $time;
239
  my $revision;
240
  my $author;
241
  my $msg_txt;
242
  my $detected_file_separator;
243
 
244
  # Might be adding to an existing ChangeLog
245
  my $accumulation_date = &maybe_grab_accumulation_date ();
246
  if ($accumulation_date) {
247
    $Log_Source_Command .= " -d\'>${accumulation_date}\'";
248
  }
249
 
250
  # We might be expanding usernames
251
  my %usermap;
252
 
253
  # In general, it's probably not very maintainable to use state
254
  # variables like this to tell the loop what it's doing at any given
255
  # moment, but this is only the first one, and if we never have more
256
  # than a few of these, it's okay.
257
  my $collecting_symbolic_names = 0;
258
  my %symbolic_names;    # Where tag names get stored.
259
  my %branch_names;      # We'll grab branch names while we're at it.
260
  my %branch_numbers;    # Save some revisions for @Follow_Branches
261
  my @branch_roots;      # For showing which files are branch ancestors.
262
 
263
  # Bleargh.  Compensate for a deficiency of custom wrapping.
264
  if (($After_Header ne " ") and $FSF_Style)
265
  {
266
    $After_Header .= "\t";
267
  }
268
 
269
  if (! $Input_From_Stdin) {
270
    open (LOG_SOURCE, "$Log_Source_Command |")
271
        or die "unable to run \"${Log_Source_Command}\"";
272
  }
273
  else {
274
    open (LOG_SOURCE, "-") or die "unable to open stdin for reading";
275
  }
276
 
277
  %usermap = &maybe_read_user_map_file ();
278
 
279
  while (<LOG_SOURCE>)
280
  {
281
    # If on a new file and don't see filename, skip until we find it, and
282
    # when we find it, grab it.
283
    if ((! (defined $file_full_path)) and /^Working file: (.*)/) 
284
    {
285
      $file_full_path = $1;
286
      if (@Ignore_Files) 
287
      {
288
        my $base;
289
        ($base, undef, undef) = fileparse ($file_full_path);
290
        # Ouch, I wish trailing operators in regexps could be
291
        # evaluated on the fly!
292
        if ($Case_Insensitive) {
293
          if (grep ($file_full_path =~ m|$_|i, @Ignore_Files)) {
294
            undef $file_full_path;
295
          }
296
        }
297
        elsif (grep ($file_full_path =~ m|$_|, @Ignore_Files)) {
298
          undef $file_full_path;
299
        }
300
      }
301
      next;
302
    }
303
 
304
    # Just spin wheels if no file defined yet.
305
    next if (! $file_full_path);
306
 
307
    # Collect tag names in case we're asked to print them in the output.
308
    if (/^symbolic names:$/) {
309
      $collecting_symbolic_names = 1;
310
      next;  # There's no more info on this line, so skip to next
311
    }
312
    if ($collecting_symbolic_names)
313
    {
314
      # All tag names are listed with whitespace in front in cvs log
315
      # output; so if see non-whitespace, then we're done collecting.
316
      if (/^\S/) {
317
        $collecting_symbolic_names = 0;
318
      }
319
      else    # we're looking at a tag name, so parse & store it
320
      {
321
        # According to the Cederqvist manual, in node "Tags", tag
322
        # names must start with an uppercase or lowercase letter and
323
        # can contain uppercase and lowercase letters, digits, `-',
324
        # and `_'.  However, it's not our place to enforce that, so
325
        # we'll allow anything CVS hands us to be a tag:
326
        /^\s+([^:]+): ([\d.]+)$/;
327
        my $tag_name = $1;
328
        my $tag_rev  = $2;
329
 
330
        # A branch number either has an odd number of digit sections
331
        # (and hence an even number of dots), or has ".0." as the
332
        # second-to-last digit section.  Test for these conditions.
333
        my $real_branch_rev = "";
334
        if (($tag_rev =~ /^(\d+\.\d+\.)+\d+$/)   # Even number of dots...
335
            and (! ($tag_rev =~ /^(1\.)+1$/)))   # ...but not "1.[1.]1"
336
        {
337
          $real_branch_rev = $tag_rev;
338
        }
339
        elsif ($tag_rev =~ /(\d+\.(\d+\.)+)0.(\d+)/)  # Has ".0."
340
        {
341
          $real_branch_rev = $1 . $3;
342
        }
343
        # If we got a branch, record its number.
344
        if ($real_branch_rev)
345
        {
346
          $branch_names{$real_branch_rev} = $tag_name;
347
          if (@Follow_Branches) {
348
            if (grep ($_ eq $tag_name, @Follow_Branches)) {
349
              $branch_numbers{$tag_name} = $real_branch_rev;
350
            }
351
          }
352
        }
353
        else {
354
          # Else it's just a regular (non-branch) tag.
355
          push (@{$symbolic_names{$tag_rev}}, $tag_name);
356
        }
357
      }
358
    }
359
    # End of code for collecting tag names.
360
 
361
    # If have file name, but not revision, and see revision, then grab
362
    # it.  (We collect unconditionally, even though we may or may not
363
    # ever use it.)
364
    if ((! (defined $revision)) and (/^revision (\d+\.[\d.]+)/))
365
    {
366
      $revision = $1;
367
 
368
      if (@Follow_Branches)
369
      {
370
        foreach my $branch (@Follow_Branches) 
371
        {
372
          # Special case for following trunk revisions
373
          if (($branch =~ /^trunk$/i) and ($revision =~ /^[0-9]+\.[0-9]+$/))
374
          {
375
            goto dengo;
376
          }
377
 
378
          my $branch_number = $branch_numbers{$branch};
379
          if ($branch_number) 
380
          {
381
            # Are we on one of the follow branches or an ancestor of
382
            # same?
383
            #
384
            # If this revision is a prefix of the branch number, or
385
            # possibly is less in the minormost number, OR if this
386
            # branch number is a prefix of the revision, then yes.
387
            # Otherwise, no.
388
            #
389
            # So below, we determine if any of those conditions are
390
            # met.
391
 
392
            # Trivial case: is this revision on the branch?
393
            # (Compare this way to avoid regexps that screw up Emacs
394
            # indentation, argh.)
395
            if ((substr ($revision, 0, ((length ($branch_number)) + 1)))
396
                eq ($branch_number . "."))
397
            {
398
              goto dengo;
399
            }
400
            # Non-trivial case: check if rev is ancestral to branch
401
            elsif ((length ($branch_number)) > (length ($revision)))
402
            {
403
              $revision =~ /^((?:\d+\.)+)(\d+)$/;
404
              my $r_left = $1;          # still has the trailing "."
405
              my $r_end = $2;
406
 
407
              $branch_number =~ /^((?:\d+\.)+)(\d+)\.\d+$/;
408
              my $b_left = $1;  # still has trailing "."
409
              my $b_mid  = $2;   # has no trailing "."
410
 
411
              if (($r_left eq $b_left)
412
                  && ($r_end <= $b_mid))
413
              {
414
                goto dengo;
415
              }
416
            }
417
          }
418
        }
419
      }
420
      else    # (! @Follow_Branches)
421
      {
422
        next;
423
      }
424
 
425
      # Else we are following branches, but this revision isn't on the
426
      # path.  So skip it.
427
      undef $revision;
428
    dengo:
429
      next;
430
    }
431
 
432
    # If we don't have a revision right now, we couldn't possibly
433
    # be looking at anything useful. 
434
    if (! (defined ($revision))) {
435
      $detected_file_separator = /^$file_separator$/o;
436
      if ($detected_file_separator) {
437
        # No revisions for this file; can happen, e.g. "cvs log -d DATE"
438
        goto CLEAR;
439
      }
440
      else {
441
        next;
442
      }
443
    }
444
 
445
    # If have file name but not date and author, and see date or
446
    # author, then grab them:
447
    unless (defined $time) 
448
    {
449
      if (/^date: .*/)
450
      {
451
        ($time, $author) = &parse_date_and_author ($_);
452
        if (defined ($usermap{$author}) and $usermap{$author}) {
453
          $author = $usermap{$author};
454
        }
455
      }
456
      else {
457
        $detected_file_separator = /^$file_separator$/o;
458
        if ($detected_file_separator) {
459
          # No revisions for this file; can happen, e.g. "cvs log -d DATE"
460
          goto CLEAR;
461
        }
462
      }
463
      # If the date/time/author hasn't been found yet, we couldn't
464
      # possibly care about anything we see.  So skip:
465
      next;
466
    }
467
 
468
    # A "branches: ..." line here indicates that one or more branches
469
    # are rooted at this revision.  If we're showing branches, then we
470
    # want to show that fact as well, so we collect all the branches
471
    # that this is the latest ancestor of and store them in
472
    # @branch_roots.  Just for reference, the format of the line we're
473
    # seeing at this point is:
474
    #
475
    #    branches:  1.5.2;  1.5.4;  ...;
476
    #
477
    # Okay, here goes:
478
 
479
    if (/^branches:\s+(.*);$/)
480
    {
481
      if ($Show_Branches)
482
      {
483
        my $lst = $1;
484
        $lst =~ s/(1\.)+1;|(1\.)+1$//;  # ignore the trivial branch 1.1.1
485
        if ($lst) {
486
          @branch_roots = split (/;\s+/, $lst);
487
        }
488
        else {
489
          undef @branch_roots;
490
        }
491
        next;
492
      }
493
      else
494
      {
495
        # Ugh.  This really bothers me.  Suppose we see a log entry
496
        # like this:
497
        #
498
        #    ----------------------------
499
        #    revision 1.1
500
        #    date: 1999/10/17 03:07:38;  author: jrandom;  state: Exp;
501
        #    branches:  1.1.2;
502
        #    Intended first line of log message begins here.
503
        #    ----------------------------
504
        #
505
        # The question is, how we can tell the difference between that
506
        # log message and a *two*-line log message whose first line is
507
        # 
508
        #    "branches:  1.1.2;"
509
        #
510
        # See the problem?  The output of "cvs log" is inherently
511
        # ambiguous.
512
        #
513
        # For now, we punt: we liberally assume that people don't
514
        # write log messages like that, and just toss a "branches:"
515
        # line if we see it but are not showing branches.  I hope no
516
        # one ever loses real log data because of this.
517
        next;
518
      }
519
    }
520
 
521
    # If have file name, time, and author, then we're just grabbing
522
    # log message texts:
523
    $detected_file_separator = /^$file_separator$/o;
524
    if ($detected_file_separator && ! (defined $revision)) {
525
      # No revisions for this file; can happen, e.g. "cvs log -d DATE"
526
      goto CLEAR;
527
    }
528
    unless ($detected_file_separator || /^$logmsg_separator$/o)
529
    {
530
      $msg_txt .= $_;   # Normally, just accumulate the message...
531
      next;
532
    }
533
    # ... until a msg separator is encountered:
534
    # Ensure the message contains something:
535
    if ((! $msg_txt)
536
        || ($msg_txt =~ /^\s*\.\s*$|^\s*$/)
537
        || ($msg_txt =~ /\*\*\* empty log message \*\*\*/)) 
538
    {
539
      if ($Prune_Empty_Msgs) {
540
        goto CLEAR;
541
      }
542
      # else
543
      $msg_txt = "[no log message]\n";
544
    }
545
 
546
    ### Store it all in the Grand Poobah:
547
    {
548
      my $dir_key;        # key into %grand_poobah
549
      my %qunk;           # complicated little jobbie, see below
550
 
551
      # Each revision of a file has a little data structure (a `qunk') 
552
      # associated with it.  That data structure holds not only the
553
      # file's name, but any additional information about the file
554
      # that might be needed in the output, such as the revision
555
      # number, tags, branches, etc.  The reason to have these things
556
      # arranged in a data structure, instead of just appending them
557
      # textually to the file's name, is that we may want to do a
558
      # little rearranging later as we write the output.  For example,
559
      # all the files on a given tag/branch will go together, followed
560
      # by the tag in parentheses (so trunk or otherwise non-tagged
561
      # files would go at the end of the file list for a given log
562
      # message).  This rearrangement is a lot easier to do if we
563
      # don't have to reparse the text.
564
      #
565
      # A qunk looks like this:
566
      #
567
      #   { 
568
      #     filename    =>    "hello.c",
569
      #     revision    =>    "1.4.3.2",
570
      #     time        =>    a timegm() return value (moment of commit)
571
      #     tags        =>    [ "tag1", "tag2", ... ],
572
      #     branch      =>    "branchname" # There should be only one, right?
573
      #     branchroots =>    [ "branchtag1", "branchtag2", ... ]
574
      #   }
575
 
576
      if ($Distributed) {
577
        # Just the basename, don't include the path.
578
        ($qunk{'filename'}, $dir_key, undef) = fileparse ($file_full_path);
579
      }
580
      else {
581
        $dir_key = "./";
582
        $qunk{'filename'} = $file_full_path;
583
      }
584
 
585
      # This may someday be used in a more sophisticated calculation
586
      # of what other files are involved in this commit.  For now, we
587
      # don't use it, because the common-commit-detection algorithm is
588
      # hypothesized to be "good enough" as it stands.
589
      $qunk{'time'} = $time;
590
 
591
      # We might be including revision numbers and/or tags and/or
592
      # branch names in the output.  Most of the code from here to
593
      # loop-end deals with organizing these in qunk.
594
 
595
      $qunk{'revision'} = $revision;
596
 
597
      # Grab the branch, even though we may or may not need it:
598
      $qunk{'revision'} =~ /((?:\d+\.)+)\d+/;
599
      my $branch_prefix = $1;
600
      $branch_prefix =~ s/\.$//;  # strip off final dot
601
      if ($branch_names{$branch_prefix}) {
602
        $qunk{'branch'} = $branch_names{$branch_prefix};
603
      }
604
 
605
      # If there's anything in the @branch_roots array, then this
606
      # revision is the root of at least one branch.  We'll display
607
      # them as branch names instead of revision numbers, the
608
      # substitution for which is done directly in the array:
609
      if (@branch_roots) {
610
        my @roots = map { $branch_names{$_} } @branch_roots;
611
        $qunk{'branchroots'} = \@roots;
612
      }
613
 
614
      # Save tags too.
615
      if (defined ($symbolic_names{$revision})) {
616
        $qunk{'tags'} = $symbolic_names{$revision};
617
        delete $symbolic_names{$revision};
618
      }
619
 
620
      # Add this file to the list
621
      # (We use many spoonfuls of autovivication magic. Hashes and arrays
622
      # will spring into existence if they aren't there already.)
623
 
624
      &debug ("(pushing log msg for ${dir_key}$qunk{'filename'})\n");
625
 
626
      # Store with the files in this commit.  Later we'll loop through
627
      # again, making sure that revisions with the same log message
628
      # and nearby commit times are grouped together as one commit.
629
      push (@{$grand_poobah{$dir_key}{$author}{$time}{$msg_txt}}, \%qunk);
630
    }
631
 
632
  CLEAR:
633
    # Make way for the next message
634
    undef $msg_txt;
635
    undef $time;
636
    undef $revision;
637
    undef $author;
638
    undef @branch_roots;
639
 
640
    # Maybe even make way for the next file:
641
    if ($detected_file_separator) {
642
      undef $file_full_path;
643
      undef %branch_names;
644
      undef %branch_numbers;
645
      undef %symbolic_names;
646
    }
647
  }
648
 
649
  close (LOG_SOURCE);
650
 
651
  ### Process each ChangeLog
652
 
653
  while (my ($dir,$authorhash) = each %grand_poobah)
654
  {
655
    &debug ("DOING DIR: $dir\n");
656
 
657
    # Here we twist our hash around, from being
658
    #   author => time => message => filelist
659
    # in %$authorhash to
660
    #   time => author => message => filelist
661
    # in %changelog.  
662
    #
663
    # This is also where we merge entries.  The algorithm proceeds
664
    # through the timeline of the changelog with a sliding window of
665
    # $Max_Checkin_Duration seconds; within that window, entries that
666
    # have the same log message are merged.
667
    #
668
    # (To save space, we zap %$authorhash after we've copied
669
    # everything out of it.) 
670
 
671
    my %changelog;
672
    while (my ($author,$timehash) = each %$authorhash)
673
    {
674
      my $lasttime;
675
      my %stamptime;
676
      foreach my $time (sort {$main::a <=> $main::b} (keys %$timehash))
677
      {
678
        my $msghash = $timehash->{$time};
679
        while (my ($msg,$qunklist) = each %$msghash)
680
        {
681
 	  my $stamptime = $stamptime{$msg};
682
          if ((defined $stamptime)
683
              and (($time - $stamptime) < $Max_Checkin_Duration)
684
              and (defined $changelog{$stamptime}{$author}{$msg}))
685
          {
686
 	    push(@{$changelog{$stamptime}{$author}{$msg}}, @$qunklist);
687
          }
688
          else {
689
            $changelog{$time}{$author}{$msg} = $qunklist;
690
            $stamptime{$msg} = $time;
691
          }
692
        }
693
      }
694
    }
695
    undef (%$authorhash);
696
 
697
    ### Now we can write out the ChangeLog!
698
 
699
    my ($logfile_here, $logfile_bak, $tmpfile);
700
 
701
    if (! $Output_To_Stdout) {
702
      $logfile_here =  $dir . $Log_File_Name;
703
      $logfile_here =~ s/^\.\/\//\//;   # fix any leading ".//" problem
704
      $tmpfile      = "${logfile_here}.cvs2cl$$.tmp";
705
      $logfile_bak  = "${logfile_here}.bak";
706
 
707
      open (LOG_OUT, ">$tmpfile") or die "Unable to open \"$tmpfile\"";
708
    }
709
    else {
710
      open (LOG_OUT, ">-") or die "Unable to open stdout for writing";
711
    }
712
 
713
    print LOG_OUT $ChangeLog_Header;
714
 
715
    if ($XML_Output) {
716
      print LOG_OUT "<?xml version=\"1.0\"?>\n\n"
717
          . "<changelog xmlns=\"http://www.red-bean.com/xmlns/cvs2cl/\">\n\n";
718
    }
719
 
720
    foreach my $time (sort {$main::b <=> $main::a} (keys %changelog))
721
    {
722
      my $authorhash = $changelog{$time};
723
      while (my ($author,$mesghash) = each %$authorhash)
724
      {
725
        # If XML, escape in outer loop to avoid compound quoting:
726
        if ($XML_Output) {
727
          $author = &xml_escape ($author);
728
        }
729
 
730
        while (my ($msg,$qunklist) = each %$mesghash)
731
        {
732
          my $files               = &pretty_file_list ($qunklist);
733
          my $header_line;          # date and author
734
          my $body;                 # see below
735
          my $wholething;           # $header_line + $body
736
 
737
          # Set up the date/author line.
738
          # kff todo: do some more XML munging here, on the header
739
          # part of the entry:
740
          my ($ignore,$min,$hour,$mday,$mon,$year,$wday)
741
              = $UTC_Times ? gmtime($time) : localtime($time);
742
 
743
          # XML output includes everything else, we might as well make
744
          # it always include Day Of Week too, for consistency.
745
          if ($Show_Day_Of_Week or $XML_Output) {
746
            $wday = ("Sunday", "Monday", "Tuesday", "Wednesday",
747
                     "Thursday", "Friday", "Saturday")[$wday];
748
            $wday = ($XML_Output) ? "<weekday>${wday}</weekday>\n" : " $wday";
749
          }
750
          else {
751
            $wday = "";
752
          }
753
 
754
          if ($XML_Output) {
755
            $header_line = 
756
                sprintf ("<date>%4u-%02u-%02u</date>\n"
757
                         . "${wday}"
758
                         . "<time>%02u:%02u</time>\n"
759
                         . "<author>%s</author>\n",
760
                         $year+1900, $mon+1, $mday, $hour, $min, $author);
761
          }
762
          else {
763
            $header_line = 
764
                sprintf ("%4u-%02u-%02u${wday} %02u:%02u  %s\n\n",
765
                         $year+1900, $mon+1, $mday, $hour, $min, $author);
766
          }
767
 
768
          # Reshape the body according to user preferences.
769
          if ($XML_Output) 
770
          {
771
            $msg = &preprocess_msg_text ($msg);
772
            $body = $files . $msg;
773
          }
774
          elsif ($No_Wrap) 
775
          {
776
            $msg = &preprocess_msg_text ($msg);
777
            $files = wrap ("\t", "	", "$files");
778
            $msg =~ s/\n(.*)/\n\t$1/g;
779
            unless ($After_Header eq " ") {
780
              $msg =~ s/^(.*)/\t$1/g;
781
            }
782
            $body = $files . $After_Header . $msg;
783
          }
784
          else  # do wrapping, either FSF-style or regular
785
          {
786
            if ($FSF_Style)
787
            {
788
              $files = wrap ("\t", "        ", "$files");
789
 
790
              my $files_last_line_len = 0;
791
              if ($After_Header eq " ")
792
              {
793
                $files_last_line_len = &last_line_len ($files);
794
                $files_last_line_len += 1;  # for $After_Header
795
              }
796
 
797
              $msg = &wrap_log_entry
798
                  ($msg, "\t", 69 - $files_last_line_len, 69);
799
              $body = $files . $After_Header . $msg;
800
            }
801
            else  # not FSF-style
802
            {
803
              $msg = &preprocess_msg_text ($msg);
804
              $body = $files . $After_Header . $msg;
805
              $body = wrap ("\t", "        ", "$body");
806
            }
807
          }
808
 
809
          $wholething = $header_line . $body;
810
 
811
          if ($XML_Output) {
812
            $wholething = "<entry>\n${wholething}</entry>\n";
813
          }
814
 
815
          # One last check: make sure it passes the regexp test, if the
816
          # user asked for that.  We have to do it here, so that the
817
          # test can match against information in the header as well
818
          # as in the text of the log message.
819
 
820
          # How annoying to duplicate so much code just because I
821
          # can't figure out a way to evaluate scalars on the trailing
822
          # operator portion of a regular expression.  Grrr.
823
          if ($Case_Insensitive) {
824
            unless ($Regexp_Gate && ($wholething !~ /$Regexp_Gate/oi)) { 
825
              print LOG_OUT "${wholething}\n";
826
            }
827
          }
828
          else {
829
            unless ($Regexp_Gate && ($wholething !~ /$Regexp_Gate/o)) { 
830
              print LOG_OUT "${wholething}\n";
831
            }
832
          }
833
        }
834
      }
835
    }
836
 
837
    if ($XML_Output) {
838
      print LOG_OUT "</changelog>\n";
839
    }
840
 
841
    close (LOG_OUT);
842
 
843
    if (! $Output_To_Stdout) 
844
    {
845
      # If accumulating, append old data to new before renaming.  But
846
      # don't append the most recent entry, since it's already in the
847
      # new log due to CVS's idiosyncratic interpretation of "log -d".
848
      if ($Cumulative && -f $logfile_here)
849
      {
850
        open (NEW_LOG, ">>$tmpfile")
851
            or die "trouble appending to $tmpfile ($!)";
852
 
853
        open (OLD_LOG, "<$logfile_here")
854
            or die "trouble reading from $logfile_here ($!)";
855
 
856
        my $started_first_entry = 0;
857
        my $passed_first_entry = 0;
858
        while (<OLD_LOG>) 
859
        {
860
          if (! $passed_first_entry)
861
          {
862
            if ((! $started_first_entry)
863
                && /^(\d\d\d\d-\d\d-\d\d\s+\d\d:\d\d)/) {
864
              $started_first_entry = 1;
865
            }
866
            elsif (/^(\d\d\d\d-\d\d-\d\d\s+\d\d:\d\d)/) {
867
              $passed_first_entry = 1;
868
              print NEW_LOG $_;
869
            }
870
          }
871
          else {
872
            print NEW_LOG $_;
873
          }
874
        }
875
 
876
        close (NEW_LOG);
877
        close (OLD_LOG);
878
      }
879
 
880
      if (-f $logfile_here) {
881
        rename ($logfile_here, $logfile_bak);
882
      } 
883
      rename ($tmpfile, $logfile_here);
884
    }
885
  }
886
}
887
 
888
 
889
sub parse_date_and_author ()
890
{
891
  # Parses the date/time and author out of a line like: 
892
  #
893
  # date: 1999/02/19 23:29:05;  author: apharris;  state: Exp;
894
 
895
  my $line = shift;
896
 
897
  my ($year, $mon, $mday, $hours, $min, $secs, $author) = $line =~
898
      m#(\d+)/(\d+)/(\d+)\s+(\d+):(\d+):(\d+);\s+author:\s+([^;]+);#
899
          or  die "Couldn't parse date ``$line''";
900
  die "Bad date or Y2K issues" unless ($year > 1969 and $year < 2258);
901
  # Kinda arbitrary, but useful as a sanity check
902
  my $time = timegm($secs,$min,$hours,$mday,$mon-1,$year-1900);
903
 
904
  return ($time, $author);
905
}
906
 
907
 
908
# Here we take a bunch of qunks and convert them into printed
909
# summary that will include all the information the user asked for.
910
sub pretty_file_list ()
911
{
912
  if ($Hide_Filenames and (! $XML_Output)) {
913
    return "";
914
  }
915
 
916
  my $qunksref = shift;
917
  my @qunkrefs = @$qunksref;
918
  my @filenames;
919
  my $beauty = "";          # The accumulating header string for this entry.
920
  my %non_unanimous_tags;   # Tags found in a proper subset of qunks
921
  my %unanimous_tags;       # Tags found in all qunks
922
  my %all_branches;         # Branches found in any qunk
923
  my $common_dir = undef;   # Dir prefix common to all files ("" if none)
924
  my $fbegun = 0;           # Did we begin printing filenames yet?
925
 
926
  # First, loop over the qunks gathering all the tag/branch names.
927
  # We'll put them all in non_unanimous_tags, and take out the
928
  # unanimous ones later.
929
  foreach my $qunkref (@qunkrefs) 
930
  {
931
    # Keep track of whether all the files in this commit were in the
932
    # same directory, and memorize it if so.  We can make the output a
933
    # little more compact by mentioning the directory only once.
934
    if ((scalar (@qunkrefs)) > 1)
935
    {
936
      if (! (defined ($common_dir)))
937
      {
938
        my ($base, $dir);
939
        ($base, $dir, undef) = fileparse ($$qunkref{'filename'});
940
 
941
        if ((! (defined ($dir)))  # this first case is sheer paranoia
942
            or ($dir eq "")
943
            or ($dir eq "./")
944
            or ($dir eq ".\\")) 
945
        {
946
          $common_dir = "";
947
        }
948
        else
949
        {
950
          $common_dir = $dir;
951
        }
952
      }
953
      elsif ($common_dir ne "")
954
      {
955
        # Already have a common dir prefix, so how much of it can we preserve?
956
        $common_dir = &common_path_prefix ($$qunkref{'filename'}, $common_dir);
957
      }
958
    }
959
    else  # only one file in this entry anyway, so common dir not an issue
960
    {
961
      $common_dir = "";
962
    }
963
 
964
    if (defined ($$qunkref{'branch'})) {
965
      $all_branches{$$qunkref{'branch'}} = 1;
966
    }
967
    if (defined ($$qunkref{'tags'})) {
968
      foreach my $tag (@{$$qunkref{'tags'}}) {
969
        $non_unanimous_tags{$tag} = 1;
970
      }
971
    }
972
  }
973
 
974
  # Any tag held by all qunks will be printed specially... but only if
975
  # there are multiple qunks in the first place!
976
  if ((scalar (@qunkrefs)) > 1) {
977
    foreach my $tag (keys (%non_unanimous_tags)) {
978
      my $everyone_has_this_tag = 1;
979
      foreach my $qunkref (@qunkrefs) {
980
        if ((! (defined ($$qunkref{'tags'})))
981
            or (! (grep ($_ eq $tag, @{$$qunkref{'tags'}})))) {
982
          $everyone_has_this_tag = 0;
983
        }
984
      }
985
      if ($everyone_has_this_tag) {
986
        $unanimous_tags{$tag} = 1;
987
        delete $non_unanimous_tags{$tag};
988
      }
989
    }
990
  }
991
 
992
  if ($XML_Output)
993
  {
994
    # If outputting XML, then our task is pretty simple, because we
995
    # don't have to detect common dir, common tags, branch prefixing,
996
    # etc.  We just output exactly what we have, and don't worry about
997
    # redundancy or readability.
998
 
999
    foreach my $qunkref (@qunkrefs) 
1000
    {
1001
      my $filename    = $$qunkref{'filename'};
1002
      my $revision    = $$qunkref{'revision'};
1003
      my $tags        = $$qunkref{'tags'};
1004
      my $branch      = $$qunkref{'branch'};
1005
      my $branchroots = $$qunkref{'branchroots'};
1006
 
1007
      $filename = &xml_escape ($filename);   # probably paranoia
1008
      $revision = &xml_escape ($revision);   # definitely paranoia
1009
 
1010
      $beauty .= "<file>\n";
1011
      $beauty .= "<name>${filename}</name>\n";
1012
      $beauty .= "<revision>${revision}</revision>\n";
1013
      if ($branch) {
1014
        $branch   = &xml_escape ($branch);     # more paranoia
1015
        $beauty .= "<branch>${branch}</branch>\n";
1016
      }
1017
      foreach my $tag (@$tags) {
1018
        $tag = &xml_escape ($tag);  # by now you're used to the paranoia
1019
        $beauty .= "<tag>${tag}</tag>\n";
1020
      }
1021
      foreach my $root (@$branchroots) {
1022
        $root = &xml_escape ($root);  # which is good, because it will continue
1023
        $beauty .= "<branchroot>${root}</branchroot>\n";
1024
      }
1025
      $beauty .= "</file>\n";
1026
    }
1027
 
1028
    # Theoretically, we could go home now.  But as long as we're here,
1029
    # let's print out the common_dir and utags, as a convenience to
1030
    # the receiver (after all, earlier code calculated that stuff
1031
    # anyway, so we might as well take advantage of it).
1032
 
1033
    if ((scalar (keys (%unanimous_tags))) > 1) {
1034
      foreach my $utag ((keys (%unanimous_tags))) {
1035
        $utag = &xml_escape ($utag);   # the usual paranoia
1036
        $beauty .= "<utag>${utag}</utag>\n";
1037
      }
1038
    }
1039
    if ($common_dir) {
1040
      $common_dir = &xml_escape ($common_dir);
1041
      $beauty .= "<commondir>${common_dir}</commondir>\n";
1042
    }
1043
 
1044
    # That's enough for XML, time to go home:
1045
    return $beauty;
1046
  }
1047
 
1048
  # Else not XML output, so complexly compactify for chordate
1049
  # consumption.  At this point we have enough global information
1050
  # about all the qunks to organize them non-redundantly for output.
1051
 
1052
  if ($common_dir) {
1053
    # Note that $common_dir still has its trailing slash
1054
    $beauty .= "$common_dir: ";
1055
  }
1056
 
1057
  if ($Show_Branches)
1058
  {
1059
    # For trailing revision numbers.
1060
    my @brevisions;
1061
 
1062
    foreach my $branch (keys (%all_branches))
1063
    {
1064
      foreach my $qunkref (@qunkrefs)
1065
      {
1066
        if ((defined ($$qunkref{'branch'}))
1067
            and ($$qunkref{'branch'} eq $branch))
1068
        {
1069
          if ($fbegun) {
1070
            # kff todo: comma-delimited in XML too?  Sure.
1071
            $beauty .= ", ";
1072
          } 
1073
          else {
1074
            $fbegun = 1;
1075
          }
1076
          my $fname = substr ($$qunkref{'filename'}, length ($common_dir));
1077
          $beauty .= $fname;
1078
          $$qunkref{'printed'} = 1;  # Just setting a mark bit, basically
1079
 
1080
          if ($Show_Tags && (defined @{$$qunkref{'tags'}})) {
1081
            my @tags = grep ($non_unanimous_tags{$_}, @{$$qunkref{'tags'}});
1082
            if (@tags) {
1083
              $beauty .= " (tags: ";
1084
              $beauty .= join (', ', @tags);
1085
              $beauty .= ")";
1086
            }
1087
          }
1088
 
1089
          if ($Show_Revisions) {
1090
            # Collect the revision numbers' last components, but don't
1091
            # print them -- they'll get printed with the branch name
1092
            # later.
1093
            $$qunkref{'revision'} =~ /.+\.([\d]+)$/;
1094
            push (@brevisions, $1);
1095
 
1096
            # todo: we're still collecting branch roots, but we're not
1097
            # showing them anywhere.  If we do show them, it would be
1098
            # nifty to just call them revision "0" on a the branch.
1099
            # Yeah, that's the ticket.
1100
          }
1101
        }
1102
      }
1103
      $beauty .= " ($branch";
1104
      if (@brevisions) {
1105
        if ((scalar (@brevisions)) > 1) {
1106
          $beauty .= ".[";
1107
          $beauty .= (join (',', @brevisions));
1108
          $beauty .= "]";
1109
        }
1110
        else {
1111
          $beauty .= ".$brevisions[0]";
1112
        }
1113
      }
1114
      $beauty .= ")";
1115
    }
1116
  }
1117
 
1118
  # Okay; any qunks that were done according to branch are taken care
1119
  # of, and marked as printed.  Now print everyone else.
1120
 
1121
  foreach my $qunkref (@qunkrefs)
1122
  {
1123
    next if (defined ($$qunkref{'printed'}));   # skip if already printed
1124
 
1125
    if ($fbegun) {
1126
      $beauty .= ", ";
1127
    }
1128
    else {
1129
      $fbegun = 1;
1130
    }
1131
    $beauty .= substr ($$qunkref{'filename'}, length ($common_dir));
1132
    # todo: Shlomo's change was this:
1133
    # $beauty .= substr ($$qunkref{'filename'}, 
1134
    #              (($common_dir eq "./") ? "" : length ($common_dir)));
1135
    $$qunkref{'printed'} = 1;  # Set a mark bit.
1136
 
1137
    if ($Show_Revisions || $Show_Tags)
1138
    {
1139
      my $started_addendum = 0;
1140
 
1141
      if ($Show_Revisions) {
1142
        $started_addendum = 1;
1143
        $beauty .= " (";
1144
        $beauty .= "$$qunkref{'revision'}";
1145
      }
1146
      if ($Show_Tags && (defined $$qunkref{'tags'})) {
1147
        my @tags = grep ($non_unanimous_tags{$_}, @{$$qunkref{'tags'}});
1148
        if ((scalar (@tags)) > 0) {
1149
          if ($started_addendum) {
1150
            $beauty .= ", ";
1151
          }
1152
          else {
1153
            $beauty .= " (tags: ";
1154
          }
1155
          $beauty .= join (', ', @tags);
1156
          $started_addendum = 1;
1157
        }
1158
      }
1159
      if ($started_addendum) {
1160
        $beauty .= ")";
1161
      }
1162
    }
1163
  }
1164
 
1165
  # Unanimous tags always come last.
1166
  if ($Show_Tags && %unanimous_tags)
1167
  {
1168
    $beauty .= " (utags: ";
1169
    $beauty .= join (', ', keys (%unanimous_tags));
1170
    $beauty .= ")";
1171
  }
1172
 
1173
  # todo: still have to take care of branch_roots?
1174
 
1175
  $beauty = "* $beauty:";
1176
 
1177
  return $beauty;
1178
}
1179
 
1180
 
1181
sub common_path_prefix ()
1182
{
1183
  my $path1 = shift;
1184
  my $path2 = shift;
1185
 
1186
  my ($dir1, $dir2);
1187
  (undef, $dir1, undef) = fileparse ($path1);
1188
  (undef, $dir2, undef) = fileparse ($path2);
1189
 
1190
  # Transmogrify Windows filenames to look like Unix.  
1191
  # (It is far more likely that someone is running cvs2cl.pl under
1192
  # Windows than that they would genuinely have backslashes in their
1193
  # filenames.)
1194
  $dir1 =~ tr#\\#/#;
1195
  $dir2 =~ tr#\\#/#;
1196
 
1197
  my $accum1 = "";
1198
  my $accum2 = "";
1199
  my $last_common_prefix = "";
1200
 
1201
  while ($accum1 eq $accum2)
1202
  {
1203
    $last_common_prefix = $accum1;
1204
    last if ($accum1 eq $dir1);
1205
    my ($tmp1) = split (/\//, (substr ($dir1, length ($accum1))));
1206
    my ($tmp2) = split (/\//, (substr ($dir2, length ($accum2))));
1207
    $accum1 .= "$tmp1/" if ((defined ($tmp1)) and $tmp1);
1208
    $accum2 .= "$tmp2/" if ((defined ($tmp2)) and $tmp2);
1209
  }
1210
 
1211
  return $last_common_prefix;
1212
}
1213
 
1214
 
1215
sub preprocess_msg_text ()
1216
{
1217
  my $text = shift;
1218
 
1219
  # Strip out carriage returns (as they probably result from DOSsy editors).
1220
  $text =~ s/\r\n/\n/g;
1221
 
1222
  # If it *looks* like two newlines, make it *be* two newlines:
1223
  $text =~ s/\n\s*\n/\n\n/g;
1224
 
1225
  if ($XML_Output)
1226
  {
1227
    $text = &xml_escape ($text);
1228
    $text = "<msg>${text}</msg>\n";
1229
  }
1230
  elsif (! $No_Wrap)
1231
  {
1232
    # Strip off lone newlines, but only for lines that don't begin with
1233
    # whitespace or a mail-quoting character, since we want to preserve
1234
    # that kind of formatting.  Also don't strip newlines that follow a
1235
    # period; we handle those specially next.  And don't strip
1236
    # newlines that precede an open paren.
1237
    1 while ($text =~ s/(^|\n)([^>\s].*[^.\n])\n([^>\n])/$1$2 $3/g);
1238
 
1239
    # If a newline follows a period, make sure that when we bring up the
1240
    # bottom sentence, it begins with two spaces. 
1241
    1 while ($text =~ s/(^|\n)([^>\s].*)\n([^>\n])/$1$2  $3/g);
1242
  }
1243
 
1244
  return $text;
1245
}
1246
 
1247
 
1248
sub last_line_len ()
1249
{
1250
  my $files_list = shift;
1251
  my @lines = split (/\n/, $files_list);
1252
  my $last_line = pop (@lines);
1253
  return length ($last_line);
1254
}
1255
 
1256
 
1257
# A custom wrap function, sensitive to some common constructs used in
1258
# log entries.
1259
sub wrap_log_entry ()
1260
{
1261
  my $text = shift;                  # The text to wrap.
1262
  my $left_pad_str = shift;          # String to pad with on the left.
1263
 
1264
  # These do NOT take left_pad_str into account:
1265
  my $length_remaining = shift;      # Amount left on current line.
1266
  my $max_line_length  = shift;      # Amount left for a blank line.
1267
 
1268
  my $wrapped_text = "";             # The accumulating wrapped entry.
1269
  my $user_indent = "";              # Inherited user_indent from prev line.
1270
 
1271
  my $first_time = 1;                # First iteration of the loop?
1272
  my $suppress_line_start_match = 0; # Set to disable line start checks.
1273
 
1274
  my @lines = split (/\n/, $text);
1275
  while (@lines)   # Don't use `foreach' here, it won't work.
1276
  {
1277
    my $this_line = shift (@lines);
1278
    chomp $this_line;
1279
 
1280
    if ($this_line =~ /^(\s+)/) {
1281
      $user_indent = $1;
1282
    }
1283
    else {
1284
      $user_indent = "";
1285
    }
1286
 
1287
    # If it matches any of the line-start regexps, print a newline now...
1288
    if ($suppress_line_start_match)
1289
    {
1290
      $suppress_line_start_match = 0;
1291
    }
1292
    elsif (($this_line =~ /^(\s*)\*\s+[a-zA-Z0-9]/)
1293
           || ($this_line =~ /^(\s*)\* [a-zA-Z0-9_\.\/\+-]+/)
1294
           || ($this_line =~ /^(\s*)\([a-zA-Z0-9_\.\/\+-]+(\)|,\s*)/)
1295
           || ($this_line =~ /^(\s+)(\S+)/)
1296
           || ($this_line =~ /^(\s*)- +/)
1297
           || ($this_line =~ /^()\s*$/)
1298
           || ($this_line =~ /^(\s*)\*\) +/)
1299
           || ($this_line =~ /^(\s*)[a-zA-Z0-9](\)|\.|\:) +/))
1300
    {
1301
      # Make a line break immediately, unless header separator is set
1302
      # and this line is the first line in the entry, in which case
1303
      # we're getting the blank line for free already and shouldn't
1304
      # add an extra one.
1305
      unless (($After_Header ne " ") and ($first_time))
1306
      {
1307
        if ($this_line =~ /^()\s*$/) {
1308
          $suppress_line_start_match = 1;
1309
          $wrapped_text .= "\n${left_pad_str}";
1310
        }
1311
 
1312
        $wrapped_text .= "\n${left_pad_str}";
1313
      }
1314
 
1315
      $length_remaining = $max_line_length - (length ($user_indent));
1316
    }
1317
 
1318
    # Now that any user_indent has been preserved, strip off leading
1319
    # whitespace, so up-folding has no ugly side-effects.
1320
    $this_line =~ s/^\s*//;
1321
 
1322
    # Accumulate the line, and adjust parameters for next line.
1323
    my $this_len = length ($this_line);
1324
    if ($this_len == 0)
1325
    {
1326
      # Blank lines should cancel any user_indent level.
1327
      $user_indent = "";
1328
      $length_remaining = $max_line_length;
1329
    }
1330
    elsif ($this_len >= $length_remaining) # Line too long, try breaking it.
1331
    {
1332
      # Walk backwards from the end.  At first acceptable spot, break
1333
      # a new line.
1334
      my $idx = $length_remaining - 1;
1335
      if ($idx < 0) { $idx = 0 };
1336
      while ($idx > 0)
1337
      {
1338
        if (substr ($this_line, $idx, 1) =~ /\s/)
1339
        {
1340
          my $line_now = substr ($this_line, 0, $idx);
1341
          my $next_line = substr ($this_line, $idx);
1342
          $this_line = $line_now;
1343
 
1344
          # Clean whitespace off the end.
1345
          chomp $this_line;
1346
 
1347
          # The current line is ready to be printed.
1348
          $this_line .= "\n${left_pad_str}";
1349
 
1350
          # Make sure the next line is allowed full room.
1351
          $length_remaining = $max_line_length - (length ($user_indent));
1352
 
1353
          # Strip next_line, but then preserve any user_indent.
1354
          $next_line =~ s/^\s*//;
1355
 
1356
          # Sneak a peek at the user_indent of the upcoming line, so
1357
          # $next_line (which will now precede it) can inherit that
1358
          # indent level.  Otherwise, use whatever user_indent level
1359
          # we currently have, which might be none.
1360
          my $next_next_line = shift (@lines);
1361
          if ((defined ($next_next_line)) && ($next_next_line =~ /^(\s+)/)) {
1362
            $next_line = $1 . $next_line if (defined ($1));
1363
            # $length_remaining = $max_line_length - (length ($1));
1364
            $next_next_line =~ s/^\s*//;
1365
          }
1366
          else {
1367
            $next_line = $user_indent . $next_line;
1368
          }
1369
          if (defined ($next_next_line)) {
1370
            unshift (@lines, $next_next_line);
1371
          }
1372
          unshift (@lines, $next_line);
1373
 
1374
          # Our new next line might, coincidentally, begin with one of
1375
          # the line-start regexps, so we temporarily turn off
1376
          # sensitivity to that until we're past the line.
1377
          $suppress_line_start_match = 1; 
1378
 
1379
          last;
1380
        }
1381
        else
1382
        {
1383
          $idx--;
1384
        }
1385
      }
1386
 
1387
      if ($idx == 0)
1388
      {
1389
        # We bottomed out because the line is longer than the
1390
        # available space.  But that could be because the space is
1391
        # small, or because the line is longer than even the maximum
1392
        # possible space.  Handle both cases below.
1393
 
1394
        if ($length_remaining == ($max_line_length - (length ($user_indent))))
1395
        {
1396
          # The line is simply too long -- there is no hope of ever
1397
          # breaking it nicely, so just insert it verbatim, with
1398
          # appropriate padding.
1399
          $this_line = "\n${left_pad_str}${this_line}";
1400
        }
1401
        else
1402
        {
1403
          # Can't break it here, but may be able to on the next round...
1404
          unshift (@lines, $this_line);
1405
          $length_remaining = $max_line_length - (length ($user_indent));
1406
          $this_line = "\n${left_pad_str}";
1407
        }
1408
      }
1409
    }
1410
    else  # $this_len < $length_remaining, so tack on what we can.
1411
    {
1412
      # Leave a note for the next iteration.
1413
      $length_remaining = $length_remaining - $this_len;
1414
 
1415
      if ($this_line =~ /\.$/)
1416
      {
1417
        $this_line .= "  ";
1418
        $length_remaining -= 2;
1419
      }
1420
      else  # not a sentence end
1421
      {
1422
        $this_line .= " ";
1423
        $length_remaining -= 1;
1424
      }
1425
    }
1426
 
1427
    # Unconditionally indicate that loop has run at least once.
1428
    $first_time = 0;
1429
 
1430
    $wrapped_text .= "${user_indent}${this_line}";
1431
  }
1432
 
1433
  # One last bit of padding.
1434
  $wrapped_text .= "\n";
1435
 
1436
  return $wrapped_text;
1437
}
1438
 
1439
 
1440
sub xml_escape ()
1441
{
1442
  my $txt = shift;
1443
  $txt =~ s/&/&amp;/g;
1444
  $txt =~ s/</&lt;/g;
1445
  $txt =~ s/>/&gt;/g;
1446
  return $txt;
1447
}
1448
 
1449
 
1450
sub maybe_read_user_map_file ()
1451
{
1452
  my %expansions;
1453
 
1454
  if ($User_Map_File)
1455
  {
1456
    open (MAPFILE, "<$User_Map_File")
1457
        or die ("Unable to open $User_Map_File ($!)");
1458
 
1459
    while (<MAPFILE>) 
1460
    {
1461
      next if /^\s*#/;  # Skip comment lines.
1462
      next if not /:/;  # Skip lines without colons.
1463
 
1464
      # It is now safe to split on ':'.
1465
      my ($username, $expansion) = split ':';
1466
      chomp $expansion;
1467
      $expansion =~ s/^'(.*)'$/$1/;
1468
      $expansion =~ s/^"(.*)"$/$1/;
1469
 
1470
      # If it looks like the expansion has a real name already, then
1471
      # we toss the username we got from CVS log.  Otherwise, keep
1472
      # it to use in combination with the email address.
1473
 
1474
      if ($expansion =~ /^\s*<{0,1}\S+@.*/) {
1475
        # Also, add angle brackets if none present
1476
        if (! ($expansion =~ /<\S+@\S+>/)) {
1477
          $expansions{$username} = "$username <$expansion>";
1478
        }
1479
        else {
1480
          $expansions{$username} = "$username $expansion";
1481
        }
1482
      }
1483
      else {
1484
        $expansions{$username} = $expansion;
1485
      }
1486
    }
1487
 
1488
    close (MAPFILE);
1489
  }
1490
 
1491
  return %expansions;
1492
}
1493
 
1494
 
1495
sub parse_options ()
1496
{
1497
  # Check this internally before setting the global variable.
1498
  my $output_file;
1499
 
1500
  # If this gets set, we encountered unknown options and will exit at
1501
  # the end of this subroutine.
1502
  my $exit_with_admonishment = 0;
1503
 
1504
  while (my $arg = shift (@ARGV)) 
1505
  {
1506
    if ($arg =~ /^-h$|^-help$|^--help$|^--usage$|^-?$/) {
1507
      $Print_Usage = 1;
1508
    }
1509
    elsif ($arg =~ /^--debug$/) {        # unadvertised option, heh
1510
      $Debug = 1;
1511
    }
1512
    elsif ($arg =~ /^--version$/) {
1513
      $Print_Version = 1;
1514
    }
1515
    elsif ($arg =~ /^-g$|^--global-opts$/) {
1516
      my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1517
      # Don't assume CVS is called "cvs" on the user's system:
1518
      $Log_Source_Command =~ s/(^\S*)/$1 $narg/;
1519
    }
1520
    elsif ($arg =~ /^-l$|^--log-opts$/) {
1521
      my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1522
      $Log_Source_Command .= " $narg";
1523
    }
1524
    elsif ($arg =~ /^-f$|^--file$/) {
1525
      my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1526
      $output_file = $narg;
1527
    }
1528
    elsif ($arg =~ /^--accum$/) {
1529
      $Cumulative = 1;
1530
    }
1531
    elsif ($arg =~ /^--fsf$/) {
1532
      $FSF_Style = 1;
1533
    }
1534
    elsif ($arg =~ /^-U$|^--usermap$/) {
1535
      my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1536
      $User_Map_File = $narg;
1537
    }
1538
    elsif ($arg =~ /^-W$|^--window$/) {
1539
      my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1540
      $Max_Checkin_Duration = $narg;
1541
    }
1542
    elsif ($arg =~ /^-I$|^--ignore$/) {
1543
      my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1544
      push (@Ignore_Files, $narg);
1545
    }
1546
    elsif ($arg =~ /^-C$|^--case-insensitive$/) {
1547
      $Case_Insensitive = 1;
1548
    }
1549
    elsif ($arg =~ /^-R$|^--regexp$/) {
1550
      my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1551
      $Regexp_Gate = $narg;
1552
    }
1553
    elsif ($arg =~ /^--stdout$/) {
1554
      $Output_To_Stdout = 1;
1555
    }
1556
    elsif ($arg =~ /^--version$/) {
1557
      $Print_Version = 1;
1558
    }
1559
    elsif ($arg =~ /^-d$|^--distributed$/) {
1560
      $Distributed = 1;
1561
    }
1562
    elsif ($arg =~ /^-P$|^--prune$/) {
1563
      $Prune_Empty_Msgs = 1;
1564
    }
1565
    elsif ($arg =~ /^-S$|^--separate-header$/) {
1566
      $After_Header = "\n\n";
1567
    }
1568
    elsif ($arg =~ /^--no-wrap$/) {
1569
      $No_Wrap = 1;
1570
    }
1571
    elsif ($arg =~ /^--gmt$|^--utc$/) {
1572
      $UTC_Times = 1;
1573
    }
1574
    elsif ($arg =~ /^-w$|^--day-of-week$/) {
1575
      $Show_Day_Of_Week = 1;
1576
    }
1577
    elsif ($arg =~ /^-r$|^--revisions$/) {
1578
      $Show_Revisions = 1;
1579
    }
1580
    elsif ($arg =~ /^-t$|^--tags$/) {
1581
      $Show_Tags = 1;
1582
    }
1583
    elsif ($arg =~ /^-b$|^--branches$/) {
1584
      $Show_Branches = 1;
1585
    }
1586
    elsif ($arg =~ /^-F$|^--follow$/) {
1587
      my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1588
      push (@Follow_Branches, $narg);
1589
    }
1590
    elsif ($arg =~ /^--stdin$/) {
1591
      $Input_From_Stdin = 1;
1592
    }
1593
    elsif ($arg =~ /^--header$/) {
1594
      my $narg = shift (@ARGV) || die "$arg needs argument.\n";
1595
      $ChangeLog_Header = &slurp_file ($narg);
1596
      if (! defined ($ChangeLog_Header)) {
1597
        $ChangeLog_Header = "";
1598
      }
1599
    }
1600
    elsif ($arg =~ /^--xml$/) {
1601
      $XML_Output = 1;
1602
    }
1603
    elsif ($arg =~ /^--hide-filenames$/) {
1604
      $Hide_Filenames = 1;
1605
      $After_Header = "";
1606
    }
1607
    else {
1608
      # Just add a filename as argument to the log command
1609
      $Log_Source_Command .= " $arg";
1610
    }
1611
  }
1612
 
1613
  ## Check for contradictions...
1614
 
1615
  if ($Output_To_Stdout && $Distributed) {
1616
    print STDERR "cannot pass both --stdout and --distributed\n";
1617
    $exit_with_admonishment = 1;
1618
  }
1619
 
1620
  if ($Output_To_Stdout && $output_file) {
1621
    print STDERR "cannot pass both --stdout and --file\n";
1622
    $exit_with_admonishment = 1;
1623
  }
1624
 
1625
  if ($XML_Output && $Cumulative) {
1626
    print STDERR "cannot pass both --xml and --accum\n";
1627
    $exit_with_admonishment = 1;
1628
  }
1629
 
1630
  # Or if any other error message has already been printed out, we
1631
  # just leave now:
1632
  if ($exit_with_admonishment) {
1633
    &usage ();
1634
    exit (1);
1635
  }
1636
  elsif ($Print_Usage) {
1637
    &usage ();
1638
    exit (0);
1639
  }
1640
  elsif ($Print_Version) {
1641
    &version ();
1642
    exit (0);
1643
  }
1644
 
1645
  ## Else no problems, so proceed.
1646
 
1647
  if ($output_file) {
1648
    $Log_File_Name = $output_file;
1649
  }
1650
}
1651
 
1652
 
1653
sub slurp_file ()
1654
{
1655
  my $filename = shift || die ("no filename passed to slurp_file()");
1656
  my $retstr;
1657
 
1658
  open (SLURPEE, "<${filename}") or die ("unable to open $filename ($!)");
1659
  my $saved_sep = $/;
1660
  undef $/;
1661
  $retstr = <SLURPEE>;
1662
  $/ = $saved_sep;
1663
  close (SLURPEE);
1664
  return $retstr;
1665
}
1666
 
1667
 
1668
sub debug ()
1669
{
1670
  if ($Debug) {
1671
    my $msg = shift;
1672
    print STDERR $msg;
1673
  }
1674
}
1675
 
1676
 
1677
sub version ()
1678
{
1679
  print "cvs2cl.pl version ${VERSION}; distributed under the GNU GPL.\n";
1680
}
1681
 
1682
 
1683
sub usage ()
1684
{
1685
  &version ();
1686
  print <<'END_OF_INFO';
1687
Generate GNU-style ChangeLogs in CVS working copies.
1688
 
1689
Notes about the output format(s):
1690
 
1691
   The default output of cvs2cl.pl is designed to be compact, formally
1692
   unambiguous, but still easy for humans to read.  It is largely
1693
   self-explanatory, I hope; the one abbreviation that might not be
1694
   obvious is "utags".  That stands for "universal tags" -- a
1695
   universal tag is one held by all the files in a given change entry.
1696
 
1697
   If you need output that's easy for a program to parse, use the
1698
   --xml option.  Note that with XML output, just about all available
1699
   information is included with each change entry, whether you asked
1700
   for it or not, on the theory that your parser can ignore anything
1701
   it's not looking for.
1702
 
1703
Notes about the options and arguments (the actual options are listed
1704
last in this usage message):
1705
 
1706
  * The -I and -F options may appear multiple times.
1707
 
1708
  * To follow trunk revisions, use "-F trunk" ("-F TRUNK" also works).
1709
    This is okay because no would ever, ever be crazy enough to name a
1710
    branch "trunk", right?  Right.
1711
 
1712
  * For the -U option, the UFILE should be formatted like
1713
    CVSROOT/users. That is, each line of UFILE looks like this
1714
       jrandom:jrandom@red-bean.com
1715
    or maybe even like this
1716
       jrandom:'Jesse Q. Random <jrandom@red-bean.com>'
1717
    Don't forget to quote the portion after the colon if necessary.
1718
 
1719
  * Many people want to filter by date.  To do so, invoke cvs2cl.pl
1720
    like this: 
1721
       cvs2cl.pl -l "-d'DATESPEC'"
1722
    where DATESPEC is any date specification valid for "cvs log -d".
1723
    (Note that CVS 1.10.7 and below requires there be no space between
1724
    -d and its argument).
1725
 
1726
Options/Arguments:
1727
 
1728
  -h, -help, --help, or -?     Show this usage and exit
1729
  --version                    Show version and exit
1730
  -r, --revisions              Show revision numbers in output
1731
  -b, --branches               Show branch names in revisions when possible
1732
  -t, --tags                   Show tags (symbolic names) in output
1733
  --stdin                      Read from stdin, don't run cvs log
1734
  --stdout                     Output to stdout not to ChangeLog
1735
  -d, --distributed            Put ChangeLogs in subdirs
1736
  -f FILE, --file FILE         Write to FILE instead of "ChangeLog"
1737
  --fsf                        Use this if log data is in FSF ChangeLog style
1738
  -W SECS, --window SECS       Window of time within which log entries unify
1739
  -U UFILE, --usermap UFILE    Expand usernames to email addresses from UFILE
1740
  -R REGEXP, --regexp REGEXP   Include only entries that match REGEXP
1741
  -I REGEXP, --ignore REGEXP   Ignore files whose names match REGEXP
1742
  -C, --case-insensitive       Any regexp matching is done case-insensitively
1743
  -F BRANCH, --follow BRANCH   Show only revisions on or ancestral to BRANCH
1744
  -S, --separate-header        Blank line between each header and log message
1745
  --no-wrap                    Don't auto-wrap log message (recommend -S also)
1746
  --gmt, --utc                 Show times in GMT/UTC instead of local time
1747
  --accum                      Add to an existing ChangeLog (incompat w/ --xml)
1748
  -w, --day-of-week            Show day of week
1749
  --header FILE                Get ChangeLog header from FILE ("-" means stdin)
1750
  --xml                        Output XML instead of ChangeLog format
1751
  --hide-filenames             Don't show filenames (ignored for XML output)
1752
  -P, --prune                  Don't show empty log messages
1753
  -g OPTS, --global-opts OPTS  Invoke like this "cvs OPTS log ..."
1754
  -l OPTS, --log-opts OPTS     Invoke like this "cvs ... log OPTS"
1755
  FILE1 [FILE2 ...]            Show only log information for the named FILE(s)
1756
 
1757
See http://www.red-bean.com/cvs2cl for maintenance and bug info.
1758
END_OF_INFO
1759
}
1760
 
1761
__END__
1762
 
1763
=head1 NAME
1764
 
1765
cvs2cl.pl - produces GNU-style ChangeLogs in CVS working copies, by
1766
    running "cvs log" and parsing the output.  Shared log entries are
1767
    unified in an intuitive way.
1768
 
1769
=head1 DESCRIPTION
1770
 
1771
This script generates GNU-style ChangeLog files from CVS log
1772
information.  Basic usage: just run it inside a working copy and a
1773
ChangeLog will appear.  It requires repository access (i.e., 'cvs log'
1774
must work).  Run "cvs2cl.pl --help" to see more advanced options.
1775
 
1776
See http://www.red-bean.com/cvs2cl for updates, and for instructions
1777
on getting anonymous CVS access to this script.
1778
 
1779
Maintainer: Karl Fogel <kfogel@red-bean.com>
1780
Please report bugs to <bug-cvs2cl@red-bean.com>.
1781
 
1782
=head1 README
1783
 
1784
This script generates GNU-style ChangeLog files from CVS log
1785
information.  Basic usage: just run it inside a working copy and a
1786
ChangeLog will appear.  It requires repository access (i.e., 'cvs log'
1787
must work).  Run "cvs2cl.pl --help" to see more advanced options.
1788
 
1789
See http://www.red-bean.com/cvs2cl for updates, and for instructions
1790
on getting anonymous CVS access to this script.
1791
 
1792
Maintainer: Karl Fogel <kfogel@red-bean.com>
1793
Please report bugs to <bug-cvs2cl@red-bean.com>.
1794
 
1795
=head1 PREREQUISITES
1796
 
1797
This script requires C<Text::Wrap>, C<Time::Local>, and
1798
C<File::Basename>.
1799
It also seems to require C<Perl 5.004_04> or higher.
1800
 
1801
=pod OSNAMES
1802
 
1803
any
1804
 
1805
=pod SCRIPT CATEGORIES
1806
 
1807
Version_Control/CVS
1808
 
1809
=cut
1810
 
1811
 
1812
-*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*-
1813
 
1814
Note about a bug-slash-opportunity:
1815
-----------------------------------
1816
 
1817
There's a bug in Text::Wrap, which affects cvs2cl.  This script
1818
reveals it:
1819
 
1820
  #!/usr/bin/perl -w
1821
 
1822
  use Text::Wrap;
1823
 
1824
  my $test_text =
1825
  "This script demonstrates a bug in Text::Wrap.  The very long line
1826
  following this paragraph will be relocated relative to the surrounding
1827
  text:
1828
 
1829
  ====================================================================
1830
 
1831
  See?  When the bug happens, we'll get the line of equal signs below
1832
  this paragraph, even though it should be above.";
1833
 
1834
 
1835
  # Print out the test text with no wrapping:
1836
  print "$test_text";
1837
  print "\n";
1838
  print "\n";
1839
 
1840
  # Now print it out wrapped, and see the bug:
1841
  print wrap ("\t", "        ", "$test_text");
1842
  print "\n";
1843
  print "\n";
1844
 
1845
If the line of equal signs were one shorter, then the bug doesn't
1846
happen.  Interesting.
1847
 
1848
Anyway, rather than fix this in Text::Wrap, we might as well write a
1849
new wrap() which has the following much-needed features:
1850
 
1851
* initial indentation, like current Text::Wrap()
1852
* subsequent line indentation, like current Text::Wrap()
1853
* user chooses among: force-break long words, leave them alone, or die()?
1854
* preserve existing indentation: chopped chunks from an indented line
1855
  are indented by same (like this line, not counting the asterisk!)
1856
* optional list of things to preserve on line starts, default ">"
1857
 
1858
Note that the last two are essentially the same concept, so unify in
1859
implementation and give a good interface to controlling them.
1860
 
1861
And how about:
1862
 
1863
Optionally, when encounter a line pre-indented by same as previous
1864
line, then strip the newline and refill, but indent by the same.
1865
Yeah...