Sep
07
2015
0
Improving the Tag Cloud script
The script used to update the Tag Cloud has a couple of issues.
- Tags are not counted correctly when the Markdown file has been saved using DOS/Windows style line endings
- All Markdown files are included even if a file contains a scheduled post
The following sections describe the changes to the script upd_tagcloud.sh
. The final version of the script can be downloaded here.
(1) Ensure proper (Unix style) line-endings
In case the Markdown file has been saved using DOS/Windows style line-endings
<li><a href="http://sharedmemorydump.net/tagged/font-awesome">font-awesome</a> <span class="badge">1</span></li>
<li><a href="http://sharedmemorydump.net/tagged/font-awesome
">font-awesome
</a> <span class="badge">1</span></li>
While this would be the desired result
<li><a href="http://sharedmemorydump.net/tagged/font-awesome">font-awesome</a> <span class="badge">2</span></li>
The Unix command tr
can be used to translate or delete characters. In this case we want to remove all carriage returns (CR) from the files via the -d
option
# Ensure correct line-endings
for filename in *.md; do
if grep -rq $'\r' "$filename"; then
content=$(tr -d '\015' < $filename)
echo "$content" > $filename
fi
done
(2) Proper handling of scheduled posts
The original line of code to extract the tags from all of the posts was
tags=$(cat *.md | grep "^Tags:" | sed 's/Tags:[ ]*//' | awk -F ', ' '{for(i=1;i<=NF;i++){print $i}}' | sort -f | uniq -c)
To include the filtering of files based on the Date:
meta data within the post it has been changed to the following
# Get tags from only the non-scheduled posts
tags=""
for filename in *.md; do
postdate=$(grep "^Date:" $filename | sed 's/Date:[ ]*//' | tr -d ' :-')'000000000000'
postdate=${postdate:0:12}
if [[ "$postdate" -le "$today" ]]; then
tags="$tags"$'\n'$(cat $filename | grep "^Tags:" | sed 's/Tags:[ ]*//' | awk -F ', ' '{for(i=1;i<=NF;i++){print $i}}')
fi
done
# Sort and count unique tags
tags=$(echo "$tags" | sed '/^$/d' | sort -f | uniq -c)