<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>mattdorn.com &#187; apache</title>
	<atom:link href="http://www.mattdorn.com/content/tag/apache/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mattdorn.com</link>
	<description>Generously funded by Matt Dorn</description>
	<lastBuildDate>Sun, 07 Feb 2010 00:07:14 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Getting AWStats to show Plone-authenticated users</title>
		<link>http://www.mattdorn.com/content/getting-awstats-to-show-plone-authenticated-users/</link>
		<comments>http://www.mattdorn.com/content/getting-awstats-to-show-plone-authenticated-users/#comments</comments>
		<pubDate>Sun, 15 Oct 2006 19:12:36 +0000</pubDate>
		<dc:creator>mdorn</dc:creator>
				<category><![CDATA[technology]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[awstats]]></category>
		<category><![CDATA[plone]]></category>
		<category><![CDATA[zope]]></category>

		<guid isPermaLink="false">http://67.207.132.145/wordpress/?p=36</guid>
		<description><![CDATA[


I&#8217;m using AWStats to track usage on a Plone site that&#8217;s essentially a portal for collaborative document translation.  That means that most users of the site need to login to do anything useful, and I want to see who&#8217;s logging in along with the rest of the stats on my site.
For hosting the site, [...]]]></description>
			<content:encoded><![CDATA[
<div class="document">
<!-- -*- mode: rst -*- -->
<p>I&#8217;m using AWStats to track usage on a Plone site that&#8217;s essentially a portal for collaborative document translation.  That means that most users of the site need to login to do anything useful, and I want to see who&#8217;s logging in along with the rest of the stats on my site.</p>
<p>For hosting the site, I&#8217;m using the recommended setup which uses Zope&#8217;s Virtual Host Monser to route requests and responses between Apache and the Zope application server appropriately.  I.e., I include the following two key lines in my Apache directive:</p>
<pre class="literal-block">
RewriteEngine On
RewriteRule ^/(.*) http://localhost:8080/VirtualHostBase/http/%{SERVER_NAME}:80/MyApp/VirtualHostRoot/$1 [NC,P,L]
</pre>
<p>Since I&#8217;m going thru Apache, I can record requests in log files as with any other site.  Unfortunately, Zope/Plone does not pass information about the users who authenticate via its own system to the HTTP headers.  Taking a cue from <a class="reference" href="http://mail.zope.org/pipermail/zope/2006-May/166316.html">this thread</a>, I came up with the following solution.  It&#8217;s tested on Plone 2.1.3, but probably won&#8217;t work on Plone 2.5, given the latter&#8217;s use of Pluggable Authentication Service.</p>
<p>First, I found a place in the Plone page rendering mechanism that gets executed on each page view.  I suppose the place that I chose was somewhat arbitrary, but in the &quot;authenticate&quot; method of the file <tt class="docutils literal"><span class="pre">GroupUserFolder/GroupUserFolder.py</span></tt>, I set a header called <tt class="docutils literal"><span class="pre">X-PloneUser</span></tt>, as shown in the following patch, available for your use:</p>
<pre class="literal-block">
1020,1022d1019
&lt;         # PATCH FOR TRACKING AUTH USER IN APACHE LOGS --mdorn:
&lt;         if name is not None:
&lt;             request.RESPONSE.setHeader('X-PloneUser', name)
</pre>
<p>In my Apache configuration, I had to change <tt class="docutils literal"><span class="pre">LogFormat</span></tt> from &quot;combined&quot; to something more specific, and reference the label in the <tt class="docutils literal"><span class="pre">CustomLog</span></tt>:</p>
<pre class="literal-block">
LogFormat &quot;%h %l %{X-Ploneuser}o %t \&quot;%r\&quot; %&gt;s %b \&quot;%{Referer}i\&quot; \&quot;%{User-agent}i\&quot;&quot; plone
CustomLog &quot;|/usr/local/sbin/cronolog /home/httpd/MYDOMAIN/logs/%m-%Y/access_log&quot; plone
</pre>
<p>To see authenticated users in your AWStats configuration file, you&#8217;ll need to change the default value for that setting:</p>
<pre class="literal-block">
ShowAuthenticatedUsers=1
</pre>
<p>That&#8217;s it.  Next time AWStats processes your logs, assuming you had visits from authenticated users, you&#8217;ll now see them in the appropriate location on your AWStats report.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mattdorn.com/content/getting-awstats-to-show-plone-authenticated-users/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Automating AWStats configuration for multiple domains</title>
		<link>http://www.mattdorn.com/content/automating-awstats-configuration-for-multiple-domains/</link>
		<comments>http://www.mattdorn.com/content/automating-awstats-configuration-for-multiple-domains/#comments</comments>
		<pubDate>Sun, 15 Oct 2006 18:34:52 +0000</pubDate>
		<dc:creator>mdorn</dc:creator>
				<category><![CDATA[technology]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[awstats]]></category>
		<category><![CDATA[sysadmin]]></category>

		<guid isPermaLink="false">http://67.207.132.145/wordpress/?p=35</guid>
		<description><![CDATA[


Like most Web sites, this site shares server space with a number of other domains.  When I recently undertook to set up AWStats after years of not knowing anything about what kind of traffic my personal sites were getting, I figured it would probably be relatively easy to make it so that anybody else [...]]]></description>
			<content:encoded><![CDATA[
<div class="document">
<!-- -*- mode: rst -*- -->
<p>Like most Web sites, this site shares server space with a number of other domains.  When I recently undertook to set up AWStats after years of not knowing anything about what kind of traffic my personal sites were getting, I figured it would probably be relatively easy to make it so that anybody else with an account on the box who also wanted stats could avoid the hassle.</p>
<p>With a bit of help from <a class="reference" href="http://www.dotvoid.com/view.php?id=29">another tutorial</a>, I came up with the following solution.</p>
<p>First of all, I installed the AWStats package on my server&#8217;s Fedora system via yum.  With that installation, I ended up with AWStats&#8217; default config file in <tt class="docutils literal"><span class="pre">/etc/awstats/awstats.model.conf</span></tt>.</p>
<p>This setup needs a separate AWStats configuration file for each domain, so this default file is the model from which all others will be generated.  For that reason, I needed to tweak a few of the default values to work the script described later:</p>
<pre class="literal-block">
LogFile=&quot;/home/httpd/$USERNAME/logs/%MM-0-%YYYY-0/access_log&quot;
SiteDomain=&quot;$DOMAIN&quot;
HostAliases=&quot;$ALIASES&quot;
DirData=&quot;/home/httpd/$USERNAME/awstats&quot;
AllowAccessFromWebToFollowingAuthenticatedUsers=&quot;$USERNAME&quot;
</pre>
<p>(I should also note that I enabled the GeoIP plugin in this file, but that&#8217;s a subject for a different post.)</p>
<p>Now, on the Apache side of things, you need to decide on a standard location for the Apache log files for all the accounts.  In my case, that&#8217;s /home/httpd/ACCOUNT_NAME/logs.  (In that directory, the setup described below will generate directories named by month and year (like so: MM-YYYY), and put the log file there.)  The Apache virtual host directives of course will need to specify that location.  Since I don&#8217;t actually know or have any communication with my neighbors who share the box, I&#8217;ll leave that to them, but my new directives look like this:</p>
<pre class="literal-block">
CustomLog &quot;|/usr/local/sbin/cronolog /home/httpd/DOMAIN/logs/%m-%Y/access_log&quot; combined
</pre>
<p>Where <tt class="docutils literal"><span class="pre">DOMAIN</span></tt> is the name of the directory that holds the Web files (htdocs, logs, etc.) for a given domain on the server.</p>
<p>I&#8217;ve also installed <a class="reference" href="http://cronolog.org/">cronolog</a> as an easy way to rotate logs.  As you can see the Apache config pipes to that program. That&#8217;s not strictly necessary for this setup, but if you want to do it differently (e.g., with logrotate), you&#8217;re on your own.</p>
<p>Making use of this information, anytime you want AWStats set up for a new domain, you can use the following script, which is named <tt class="docutils literal"><span class="pre">/usr/local/sbin/awstats_script.sh</span></tt> in my setup, and which was adapted from <a class="reference" href="http://www.dotvoid.com/view.php?id=29">another tutorial</a> , mentioned previously:</p>
<pre class="literal-block">
#!/bin/bash
echo &quot;Enter the username:&quot;
read WUSERNAME

ACCESS_FILE=&quot;/etc/awstats/awstats.pwd&quot;
STAT_DIR=&quot;/home/httpd/$WUSERNAME/awstats&quot;

echo &quot;Enter the password:&quot;
read PASSWORD

echo &quot;Enter the main domain:&quot;
read DOMAIN

echo &quot;Enter aliases separated by space:&quot;
read ALIASES

# Create the statistics directory
if [ -d $STAT_DIR ]; then
    echo &quot;Statistics dir already exist&quot;
else
    mkdir $STAT_DIR
fi

# Create the virtual host awstats.conf
cat /etc/awstats/awstats.model.conf | \
sed -e &quot;s/\\\$DOMAIN/$DOMAIN/g&quot; | \
sed -e &quot;s/\\\$USERNAME/$WUSERNAME/g&quot; | \
sed -e &quot;s/\\\$ALIASES/$ALIASES/g&quot; &gt; \
&quot;/etc/awstats/awstats.$DOMAIN.conf&quot;

# Add user/password to password file
if [ -e $ACCESS_FILE ]; then
    /usr/bin/htpasswd -bm $ACCESS_FILE $WUSERNAME $PASSWORD
else
    /usr/bin/htpasswd -bm -c $ACCESS_FILE $WUSERNAME $PASSWORD
fi
</pre>
<p>The &quot;username&quot; requested is the name found in the <tt class="docutils literal"><span class="pre">/home/httpd</span></tt> directory for which you want to create the site.  This will also be the username needed to login (via Apache authentication) to view the stats.  The &quot;password&quot; then requested is for that same Apache authentication (will be stored in <tt class="docutils literal"><span class="pre">/etc/awstats/awstat.pwd</span></tt>).  The &quot;main domain&quot; is what people
use to access the site, (e.g., www.DOMAIN.com), but you&#8217;ll also then be asked to include any subdomains that log to the same Apache log file (at least if you want AWStats to process them).</p>
<p>After the setup is complete, you can check out your stats by going to: <a class="reference" href="http://www.DOMAIN.com/awstats/awstats.pl">http://www.DOMAIN.com/awstats/awstats.pl</a></p>
<p>AWStats knows which config file (automatically generated) it needs by reading the domain from the URL, but you can also view other domains by appending &quot;?config=www.whatever.com&quot; to the URL.  That config file is left in <tt class="docutils literal"><span class="pre">/etc/awstats</span></tt>, and the individual AWStats DB file that results from processing the logs is left in each user&#8217;s directory in <tt class="docutils literal"><span class="pre">/home/httpd/DOMAIN/awstats</span></tt></p>
<p>While each user will be able to HTTP-authenticate against any other user&#8217;s domain, the values in the <tt class="docutils literal"><span class="pre">AllowAccessToWebToFollowingUsers</span></tt> variable will prevent users from being able to view one another&#8217;s stats.</p>
<p>Finally, note that your AWStats installation may have included a cron task to update the AWStats data in <tt class="docutils literal"><span class="pre">/etc/cron/hourly</span></tt> or other.  In any case, you&#8217;ll want AWStats to automatically process your logs on an at least daily basis via the included script <tt class="docutils literal"><span class="pre">/usr/share/awstats/tools/awstats_updateall.pl</span></tt>.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mattdorn.com/content/automating-awstats-configuration-for-multiple-domains/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Robots attack</title>
		<link>http://www.mattdorn.com/content/robots-attack/</link>
		<comments>http://www.mattdorn.com/content/robots-attack/#comments</comments>
		<pubDate>Wed, 25 May 2005 14:08:02 +0000</pubDate>
		<dc:creator>mdorn</dc:creator>
				<category><![CDATA[technology]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[sysadmin]]></category>

		<guid isPermaLink="false">http://67.207.132.145/wordpress/?p=6</guid>
		<description><![CDATA[


Once you have a Web site that gets a moderate amount of traffic, it&#8217;s likely to become the object of the attention of &#34;robots&#34; that crawl your site for the purpose of archiving or indexing its content or otherwise making the site available offline. We have one site whose number of &#34;not viewed&#34; pages exceeds [...]]]></description>
			<content:encoded><![CDATA[
<div class="document">
<!-- -*- mode: rst -*- -->
<p>Once you have a Web site that gets a moderate amount of traffic, it&#8217;s likely to become the object of the attention of &quot;robots&quot; that crawl your site for the purpose of archiving or indexing its content or otherwise making the site available offline. We have one site whose number of &quot;not viewed&quot; pages exceeds the number of (presumably) human-viewed pages by more than three times, according to our <a class="reference" href="http://awstats.sourceforge.net/">AWStats</a> numbers. While our hosting provider&#8217;s bandwidth allocation is generous, given that we&#8217;re using a particularly resource-demanding application server for the site I&#8217;m much more concerned about the strain on other, physical resources, especially memory and CPU, brought about by this activity.</p>
<p>The real answer in our particular case is to tame the site with a caching strategy and other methods to be able to better tolerate high traffic, but that involves an effort that we haven&#8217;t had the time or resources to undertake. So Plan B is to take a closer look at exactly who these robots are, and whether they really need to be looking at your site, because you have a couple of different ways of blocking their access.</p>
<p>Some of them, like Googlebot, you obviously want to give unrestricted access to your site, because they&#8217;re the mechanism by which search engines index your site and allow their users to find your content in as a result of their keyword searches.</p>
<p>These legitimate bots, though, are not so well-behaved as you might expect. Yahoo!&#8217;s &quot;Slurp&quot; indexing bot, for example, single handedly accounted for well over a gigabyte of bandwidth (and well over 10% of the traffic) in a month (it&#8217;s an extensive site but not that extensive). Is this really necessary? I don&#8217;t know enough about how these bots work to know for sure, but it strikes me as excessive.</p>
<p>These bots typically identify themselves in your Web server access logs along with other information such as the IP address:</p>
<blockquote>
MISSING TEXT</blockquote>
<p>Legitimate search engines can be instructed not to crawl your site, or to ignore sections of it using the <a class="reference" href="http://www.robotstxt.org/wc/robots.html">robots exclusion standard</a>, but most non-mainstream robots ignore this file anyway, so a more reliable method is using Apache&#8217;s Rewrite module.</p>
<p>So say you wanted to block this bot, you might use Apache&#8217;s Rewrite module to send the bot a &quot;forbidden&quot; response like so:</p>
<blockquote>
RewriteCond %{HTTP_USER_AGENT} ^(.*)Yahoo(.*) [OR]
RewriteRule .* &#8211; [F,L]</blockquote>
<p>(You can probably use regular expressions with a bit more precision than I do here.)</p>
<p>There&#8217;s the additional matter of whether these bots are really who they say they are. The information illustrated in the above access log entry is easy to spoof, including the IP and the user agent information at the end. Bad bots can even impersonate a browser like Microsoft Internet Explorer. If IPs aren&#8217;t being spoofed, you can also use Apache Rewrites to block by IP, but if they are, what can be done?</p>
<p>What I haven&#8217;t had much time to investigate is what purpose these rogue bots serve and to whom. Going down that path just a little, you find yourself in a sordid world of things like <a class="reference" href="http://www.komar.org/faq/scumbags/referrer-log-spamming/">referrer log spamming</a>&#8211;which I now realize that we&#8217;re also being victimized by&#8211;and in the company of lots of <a class="reference" href="http://www.webmasterworld.com/">other folks</a> who are trying to navigate the treacherous waters of a sea that used to be pretty calm as far as these things go.</p>
<p>For now, we&#8217;re sticking with the blunt instrument of blocking virtually all non-browser agents except Googlebot, and hoping that user agent spoofing is relatively rare.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mattdorn.com/content/robots-attack/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
