<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
> <channel><title>2657 Productions News &#187; (all categories)</title> <atom:link href="http://news.mrdwab.com/category/all-categories/feed/" rel="self" type="application/rss+xml" /><link>http://news.mrdwab.com</link> <description>..:: Whereabouts and Whatabouts of the 2657 World ::..</description> <lastBuildDate>Mon, 16 Jan 2012 05:56:16 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <item><title>I&#8217;m not at all religious, but&#8230;</title><link>http://news.mrdwab.com/2011/06/26/im-not-at-all-religious-but/</link> <comments>http://news.mrdwab.com/2011/06/26/im-not-at-all-religious-but/#comments</comments> <pubDate>Sun, 26 Jun 2011 12:43:03 +0000</pubDate> <dc:creator>Ananda</dc:creator> <category><![CDATA[(all categories)]]></category> <category><![CDATA[Humor]]></category> <category><![CDATA[Pictures]]></category> <category><![CDATA[Ambika!]]></category> <category><![CDATA[silly pictures]]></category> <guid
isPermaLink="false">http://news.mrdwab.com/?p=1199</guid> <description><![CDATA[&#8230; here is a goddess that I am happy to worship&#8230; I also had a few alternatives&#8211;and I&#8217;m still not sure which one is my favorite.]]></description> <content:encoded><![CDATA[<p>&#8230; here is a goddess that I am happy to worship&#8230;</p><div
id="attachment_1200" class="wp-caption aligncenter" style="width: 410px"><a
href="http://news.mrdwab.com/2011/06/26/im-not-at-all-religious-but/ambika-border-1/" rel="attachment wp-att-1200"><img
src="http://news.mrdwab.com/wp-content/uploads/2011/06/Ambika-Border-1-400x400.jpg" alt="" title="Ambika - Border 1" width="400" height="400" class="size-medium wp-image-1200" /></a><p
class="wp-caption-text">Don&#039;t make me squirt my milk bottle at you!</p></div><p><span
id="more-1199"></span></p><p>I also had a few alternatives&#8211;and I&#8217;m still not sure which one is my favorite.</p> <a
href='http://news.mrdwab.com/2011/06/26/im-not-at-all-religious-but/ambika-border-1/' title='Don&#039;t make me squirt my milk bottle at you!'><img
width="150" height="150" src="http://news.mrdwab.com/wp-content/uploads/2011/06/Ambika-Border-1-150x150.jpg" class="attachment-thumbnail" alt="Don&#039;t make me squirt my milk bottle at you!" title="Don&#039;t make me squirt my milk bottle at you!" /></a> <a
href='http://news.mrdwab.com/2011/06/26/im-not-at-all-religious-but/ambika-border-sepia/' title='One of the ancient goddesses...'><img
width="150" height="150" src="http://news.mrdwab.com/wp-content/uploads/2011/06/Ambika-Border-Sepia-150x150.jpg" class="attachment-thumbnail" alt="One of the ancient goddesses..." title="One of the ancient goddesses..." /></a> <a
href='http://news.mrdwab.com/2011/06/26/im-not-at-all-religious-but/ambika-colored-pencil/' title='In a coloring book coming to a bookstore near you!'><img
width="150" height="150" src="http://news.mrdwab.com/wp-content/uploads/2011/06/Ambika-Colored-Pencil-150x150.jpg" class="attachment-thumbnail" alt="In a coloring book coming to a bookstore near you!" title="In a coloring book coming to a bookstore near you!" /></a> <a
href='http://news.mrdwab.com/2011/06/26/im-not-at-all-religious-but/ambika-third-eye/' title='The third eye begins to glow just before Kali begins to emerge from Ambika.'><img
width="150" height="150" src="http://news.mrdwab.com/wp-content/uploads/2011/06/Ambika-Third-Eye-150x150.jpg" class="attachment-thumbnail" alt="The third eye begins to glow just before Kali begins to emerge from Ambika." title="The third eye begins to glow just before Kali begins to emerge from Ambika." /></a> <a
href='http://news.mrdwab.com/2011/06/26/im-not-at-all-religious-but/ambika/' title='The original....'><img
width="150" height="150" src="http://news.mrdwab.com/wp-content/uploads/2011/06/Ambika-150x150.jpg" class="attachment-thumbnail" alt="The original...." title="The original...." /></a> ]]></content:encoded> <wfw:commentRss>http://news.mrdwab.com/2011/06/26/im-not-at-all-religious-but/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>On the trucks around town&#8230;</title><link>http://news.mrdwab.com/2011/06/15/on-the-trucks-around-town/</link> <comments>http://news.mrdwab.com/2011/06/15/on-the-trucks-around-town/#comments</comments> <pubDate>Wed, 15 Jun 2011 04:40:22 +0000</pubDate> <dc:creator>Ananda</dc:creator> <category><![CDATA[(all categories)]]></category> <category><![CDATA[(non) fiction]]></category> <category><![CDATA[Humor]]></category> <category><![CDATA[India]]></category> <category><![CDATA[Pictures]]></category> <category><![CDATA[Ambika!]]></category> <category><![CDATA[silly pictures]]></category> <category><![CDATA[we two ours one]]></category> <guid
isPermaLink="false">http://news.mrdwab.com/?p=1189</guid> <description><![CDATA[Anyone who has spent some time in India is sure to have noticed the slogans painted on the back of trucks, autos, and other vehicles advising &#8220;we two, ours one&#8221;. This is part of India&#8217;s &#8220;family planning&#8221; efforts&#8211;efforts which have had a pretty bumpy history that included a forced sterilization program. Originally, the slogans were [...]]]></description> <content:encoded><![CDATA[<p>Anyone who has spent some time in India is sure to have noticed the slogans painted on the back of trucks, autos, and other vehicles advising &#8220;we two, ours one&#8221;. This is part of India&#8217;s &#8220;<a
href="http://en.wikipedia.org/wiki/Human_population_control#India">family planning</a>&#8221; efforts&#8211;efforts which have had a pretty <a
href="http://en.wikipedia.org/wiki/Family_planning_in_India">bumpy history</a> that included a forced sterilization program.</p><p>Originally, the slogans were &#8220;we two, ours two&#8221;, or at least that was the catchy English version&#8211;regional languages usually had a slogan more along the lines of &#8220;one family, two children&#8221;. And, the change to the new slogan led to at least one humorous math discussion with an auto driver who commented that, &#8220;Earlier, it was &#8216;we two, ours two&#8217;; now, it is &#8216;we two, ours one&#8217;. What&#8217;s next? &#8216;We two, ours half?&#8217;&#8221;</p><p>Anyway, keen observers might have noticed the following new addition to selected trucks:</p><p><a
href="http://news.mrdwab.com/2011/06/15/on-the-trucks-around-town/we-2-ours-1-1/" rel="attachment wp-att-1190"><img
src="http://news.mrdwab.com/wp-content/uploads/2011/06/We-2-Ours-1-1-400x300.jpg" alt="We two, ours one" title="We 2 Ours 1-1" width="400" height="300" class="aligncenter size-medium wp-image-1190" /></a></p><p><a
href="http://news.mrdwab.com/2011/06/15/on-the-trucks-around-town/we-2-ours-1-2/" rel="attachment wp-att-1193"><img
src="http://news.mrdwab.com/wp-content/uploads/2011/06/We-2-Ours-1-2-400x270.jpg" alt="We two, ours one" title="We 2 Ours 1-2" width="400" height="270" class="aligncenter size-medium wp-image-1193" /></a></p> ]]></content:encoded> <wfw:commentRss>http://news.mrdwab.com/2011/06/15/on-the-trucks-around-town/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Stratified Random Sampling in R&#8211;A Function in Progress</title><link>http://news.mrdwab.com/2011/05/15/stratified-random-sampling-in-r-beta/</link> <comments>http://news.mrdwab.com/2011/05/15/stratified-random-sampling-in-r-beta/#comments</comments> <pubDate>Sun, 15 May 2011 10:16:02 +0000</pubDate> <dc:creator>Ananda</dc:creator> <category><![CDATA[(all categories)]]></category> <category><![CDATA[Geekiness]]></category> <category><![CDATA[Useless Knowledge]]></category> <category><![CDATA[code]]></category> <category><![CDATA[experiments]]></category> <category><![CDATA[R]]></category> <category><![CDATA[R functions]]></category> <category><![CDATA[sampling]]></category> <category><![CDATA[statistics]]></category> <category><![CDATA[tapply()]]></category> <guid
isPermaLink="false">http://news.mrdwab.com/?p=1141</guid> <description><![CDATA[IMPORTANT: This is here mostly to remind me of how I solved my problem. You should read Stratified random sampling in R from a data frame if you really want to use this function. I know that sampling is quite complex, and I will admit that I know very little about its complexities. Fortunately, software [...]]]></description> <content:encoded><![CDATA[<blockquote><p><strong>IMPORTANT</strong>: This is here mostly to remind me of how I solved my problem. You should read <a
href="http://news.mrdwab.com/2011/05/20/stratified-random-sampling-in-r-from-a-data-frame/" title="Stratified random sampling in R from a data frame">Stratified random sampling in R from a data frame</a> if you really want to use this function.</p></blockquote><p>I know that sampling is quite complex, and I will admit that I know very little about its complexities. Fortunately, software like <a
href="http://www.r-project.org">R</a> lets you draw <a
href="http://news.mrdwab.com/2009/11/29/simple-sampling-with-r/" title="Simple sampling with R">simple random samples</a> pretty easily, either <a
href="http://news.mrdwab.com/2009/11/30/sampling-with-replacement-in-r/" title="Sampling with replacement in R">either with</a> or without replacement. Unfortunately, I could not find any feature to allow me to do simple stratified random sampling, at least not with the features I was looking for. Fortunately again, with a little bit of experimenting, it can be pretty easy to learn how to write functions in R when a direct solution does not present itself.</p><p>This post shares my initial &#8220;work-in-progress&#8221; on writing an R function for stratified sampling.</p><p><span
id="more-1141"></span></p><h2>The problem&#8230;</h2><p>Here&#8217;s the minimum that I was hoping for:</p><ul><li>I wanted to be able to draw both a proportional sample (which is more common, for it allows you to make generalizations about the population as a whole) as well as a fixed-size sample (which less common, but it is useful for making comparisons across groups).</li><li>I often use a seed when sampling, so I wanted that to be a part of the function.</li><li>I wanted the output to be the same as if I were to sample from each group individually.<li>I was hoping that my output could be stored as a new object that I could then reuse (either a list or a data frame, preferably the latter).</li></ul><p>My initial searches directed me to <a
href="http://yihui.name/r/stat/sampling_survey/stratified/index.htm" target="_blank">Yihui Xie&#8217;s page on stratified sampling using tapply()</a>. However, this option did not satisfy my needs. As far as I could figure, it only allowed me to take a fixed sample size. Also, I wasn&#8217;t totally satisfied with the output.</p><p>Consider the following. In Yihui Xie&#8217;s example, there is a difference between the results one would get if they sampled from each group separately, but using the same seed.</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1141code6'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p11416"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
</pre></td><td
class="code" id="p1141code6"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> dat <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>x <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">15</span>, stratum <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">gl</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">3</span>, <span style="color: #ff0000;">5</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> dat
    x stratum
<span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">1</span>       <span style="color: #ff0000;">1</span>
<span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">2</span>       <span style="color: #ff0000;">1</span>
<span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">3</span>       <span style="color: #ff0000;">1</span>
<span style="color: #ff0000;">4</span>   <span style="color: #ff0000;">4</span>       <span style="color: #ff0000;">1</span>
<span style="color: #ff0000;">5</span>   <span style="color: #ff0000;">5</span>       <span style="color: #ff0000;">1</span>
<span style="color: #ff0000;">6</span>   <span style="color: #ff0000;">6</span>       <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">7</span>   <span style="color: #ff0000;">7</span>       <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">8</span>   <span style="color: #ff0000;">8</span>       <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">9</span>   <span style="color: #ff0000;">9</span>       <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">10</span> <span style="color: #ff0000;">10</span>       <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">11</span> <span style="color: #ff0000;">11</span>       <span style="color: #ff0000;">3</span>
<span style="color: #ff0000;">12</span> <span style="color: #ff0000;">12</span>       <span style="color: #ff0000;">3</span>
<span style="color: #ff0000;">13</span> <span style="color: #ff0000;">13</span>       <span style="color: #ff0000;">3</span>
<span style="color: #ff0000;">14</span> <span style="color: #ff0000;">14</span>       <span style="color: #ff0000;">3</span>
<span style="color: #ff0000;">15</span> <span style="color: #ff0000;">15</span>       <span style="color: #ff0000;">3</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span> <span style="color: #0000FF; font-weight: bold;">tapply</span><span style="color: #080;">&#40;</span>dat$x, dat$stratum, <span style="color: #0000FF; font-weight: bold;">sample</span>, size <span style="color: #080;">=</span> <span style="color: #ff0000;">3</span><span style="color: #080;">&#41;</span>
$`<span style="color: #ff0000;">1</span>`
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">5</span> <span style="color: #ff0000;">4</span>
&nbsp;
$`<span style="color: #ff0000;">2</span>`
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">10</span>  <span style="color: #ff0000;">6</span>  <span style="color: #ff0000;">8</span>
&nbsp;
$`<span style="color: #ff0000;">3</span>`
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">15</span> <span style="color: #ff0000;">13</span> <span style="color: #ff0000;">12</span>
&nbsp;
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Compare with what we get when we sample individually:</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">5</span>, <span style="color: #ff0000;">3</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">5</span> <span style="color: #ff0000;">4</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">6</span><span style="color: #080;">:</span><span style="color: #ff0000;">10</span>, <span style="color: #ff0000;">3</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>  <span style="color: #ff0000;">7</span> <span style="color: #ff0000;">10</span>  <span style="color: #ff0000;">9</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">11</span><span style="color: #080;">:</span><span style="color: #ff0000;">15</span>, <span style="color: #ff0000;">3</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">12</span> <span style="color: #ff0000;">15</span> <span style="color: #ff0000;">14</span></pre></td></tr></table></div><p>I&#8217;m sure there&#8217;s some sampling theory that explains this, or at least something about how R treats its data, but at the moment, that&#8217;s beyond my humble level of expertise.</p><h2>Stratified sampling, Mr. DWAB style&#8230;</h2><p>The solution I arrived at is to use &#8220;unstack()&#8221; and a few conditional loops to take the samples.</p><p>And, without more rambling, here&#8217;s what I came up with.</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1141code7'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p11417"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
</pre></td><td
class="code" id="p1141code7"><pre class="rsplus" style="font-family:monospace;">stratified <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span>, size, seed, dframe<span style="color: #080;">=</span>FALSE, ...<span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
         <span style="color: #228B22;"># USE: Start with a data frame with your cases in one column and your</span>
         <span style="color: #228B22;"># groups in another column. Decide on if you want to use a seed or not. </span>
         <span style="color: #228B22;"># If not, seed should be &quot;NO&quot; (with quotes). Decide on if you want your</span>
         <span style="color: #228B22;"># output as a data frame or not; by default, dframe is set to &quot;FALSE&quot;.</span>
         <span style="color: #228B22;"># To take a sample proportional to the population size in each group,</span>
         <span style="color: #228B22;"># enter &quot;size&quot; as a decimal. Otherwise, enter size as a whole number.</span>
         <span style="color: #228B22;">#</span>
         <span style="color: #228B22;"># Example 1a: To sample 10% of each group from a data frame named &quot;z&quot;</span>
         <span style="color: #228B22;"># and using a seed of &quot;1&quot;, use: &gt; stratified(z, .1, 1)</span>
         <span style="color: #228B22;"># Example 1b: To run the same sample as above but display the result as</span>
         <span style="color: #228B22;"># a data frame, use: &gt; stratified(z, .1, 1, T)</span>
         <span style="color: #228B22;">#</span>
         <span style="color: #228B22;"># Example 2: To sample 10% of each group from a data frame named &quot;z&quot;</span>
         <span style="color: #228B22;"># and using no seed, use: &gt; stratified(z, .1, &quot;NO&quot;)</span>
         <span style="color: #228B22;">#</span>
         <span style="color: #228B22;"># Example 3: To sample 5 from each group from a data frame named &quot;z&quot;</span>
         <span style="color: #228B22;"># and using a seed of 30, use: &gt; stratified(z, 5, 30)</span>
         <span style="color: #228B22;">#</span>
         <span style="color: #228B22;"># NOTE: Not recommended for datasets with LOTS of groups or with HUGE</span>
         <span style="color: #228B22;"># differences in group sizes.</span>
  k <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">unstack</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#41;</span>
  <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>dframe <span style="color: #080;">==</span> FALSE<span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
    <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>seed <span style="color: #080;">==</span> <span style="color: #ff0000;">&quot;NO&quot;</span> <span style="color: #080;">&amp;</span> size <span style="color: #080;">&lt;</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        n <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>N<span style="color: #080;">&#41;</span><span style="color: #080;">*</span>size<span style="color: #080;">&#41;</span>
        pre <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">structure</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Population Size&quot;</span> <span style="color: #080;">=</span>
                             <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Sample Size&quot;</span> <span style="color: #080;">=</span> n, Seed <span style="color: #080;">=</span> seed,
                             Sample <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, n, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">class</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;power.htest&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>pre<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>seed <span style="color: #080;">==</span> <span style="color: #ff0000;">&quot;NO&quot;</span> <span style="color: #080;">&amp;</span> size <span style="color: #080;">&gt;=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        pre <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">structure</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Population Size&quot;</span> <span style="color: #080;">=</span>
                             <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Sample Size&quot;</span> <span style="color: #080;">=</span> size, Seed <span style="color: #080;">=</span> seed,
                             Sample <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, size, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
                             <span style="color: #0000FF; font-weight: bold;">class</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;power.htest&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>pre<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>size <span style="color: #080;">&lt;</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span>seed<span style="color: #080;">&#41;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        n <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>N<span style="color: #080;">&#41;</span><span style="color: #080;">*</span>size<span style="color: #080;">&#41;</span>
        pre <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">structure</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Population Size&quot;</span> <span style="color: #080;">=</span>
                             <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Sample Size&quot;</span> <span style="color: #080;">=</span> n, Seed <span style="color: #080;">=</span> seed,
                             Sample <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, n, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">class</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;power.htest&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>pre<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>size <span style="color: #080;">&gt;=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span>seed<span style="color: #080;">&#41;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        pre <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">structure</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Population Size&quot;</span> <span style="color: #080;">=</span>
                             <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Sample Size&quot;</span> <span style="color: #080;">=</span> size, Seed <span style="color: #080;">=</span> seed,
                             Sample <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, size, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
                             <span style="color: #0000FF; font-weight: bold;">class</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;power.htest&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>pre<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span>
  <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #080;">&#123;</span>
    <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>seed <span style="color: #080;">==</span> <span style="color: #ff0000;">&quot;NO&quot;</span> <span style="color: #080;">&amp;</span> size <span style="color: #080;">&lt;</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        n <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>N<span style="color: #080;">&#41;</span><span style="color: #080;">*</span>size<span style="color: #080;">&#41;</span>
        res <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, n, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span>, <span style="color: #ff0000;">&quot;Samples&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>seed <span style="color: #080;">==</span> <span style="color: #ff0000;">&quot;NO&quot;</span> <span style="color: #080;">&amp;</span> size <span style="color: #080;">&gt;=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        res <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, size, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span>, <span style="color: #ff0000;">&quot;Samples&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>size <span style="color: #080;">&lt;</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span>seed<span style="color: #080;">&#41;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        n <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>N<span style="color: #080;">&#41;</span><span style="color: #080;">*</span>size<span style="color: #080;">&#41;</span>
        res <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, n, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span>, <span style="color: #ff0000;">&quot;Samples&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>size <span style="color: #080;">&gt;=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span>seed<span style="color: #080;">&#41;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        res <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, size, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span>, <span style="color: #ff0000;">&quot;Samples&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span>
  <span style="color: #080;">&#125;</span>
<span style="color: #080;">&#125;</span></pre></td></tr></table></div><p>You can load the function by typing:</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1141code8'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p11418"><td
class="line_numbers"><pre>1
</pre></td><td
class="code" id="p1141code8"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">source</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;http://news.mrdwab.com/stratified-beta&quot;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div><h2>And now, to test it&#8230;</h2><p>Let&#8217;s generate some dummy data and see what we can come up with. The function takes the following arguments (in the following order):</p><ul><li><code>df</code>: The source data frame, with the first column being the IDs and the second column being the groups.</li><li><code>size</code>: The sample size you want, either as a percentage (for proportional sampling&#8211;expressed as a decimal) or as a whole number.</li><li><code>seed</code>: The seed you want to use. If you don&#8217;t want to use a seed, enter &#8220;NO&#8221;.</li><li><code>dframe</code>: What format you want the output in, either a list or a data frame. Defaults to a list (<code>dframe=FALSE</code>), which is better at the moment since the data frame option is not working the way I expect it to yet.</li></ul><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1141code9'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p11419"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
</pre></td><td
class="code" id="p1141code9"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Generate some data</span>
<span style="color: #080;">&gt;</span> a <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">100</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">123</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> b <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;a&quot;</span>, <span style="color: #ff0000;">&quot;b&quot;</span>, <span style="color: #ff0000;">&quot;c&quot;</span>, <span style="color: #ff0000;">&quot;d&quot;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">100</span>, <span style="color: #0000FF; font-weight: bold;">replace</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> z <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>a, b<span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Check how big each group is</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">table</span><span style="color: #080;">&#40;</span>z$b<span style="color: #080;">&#41;</span>
&nbsp;
 a  b  <span style="color: #0000FF; font-weight: bold;">c</span>  d
<span style="color: #ff0000;">26</span> <span style="color: #ff0000;">27</span> <span style="color: #ff0000;">20</span> <span style="color: #ff0000;">27</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Make sure the function is loaded before you continue!</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># source(&quot;http://news.mrdwab.com/stratified-beta&quot;)</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Take a 15% sample and use a seed of 1</span>
<span style="color: #080;">&gt;</span> stratified<span style="color: #080;">&#40;</span>z, .15, <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> a
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">26</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">4</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">38</span>, <span style="color: #ff0000;">45</span>, <span style="color: #ff0000;">54</span>, <span style="color: #ff0000;">81</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> b
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">27</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">4</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">39</span>, <span style="color: #ff0000;">43</span>, <span style="color: #ff0000;">60</span>, <span style="color: #ff0000;">79</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span>
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">20</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">3</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">23</span>, <span style="color: #ff0000;">26</span>, <span style="color: #ff0000;">33</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> d
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">27</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">4</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">21</span>, <span style="color: #ff0000;">31</span>, <span style="color: #ff0000;">53</span>, <span style="color: #ff0000;">71</span>
&nbsp;
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Take a sample of 5 from each group and use a seed of 1</span>
<span style="color: #080;">&gt;</span> stratified<span style="color: #080;">&#40;</span>z, <span style="color: #ff0000;">5</span>, <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> a
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">26</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">5</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">38</span>, <span style="color: #ff0000;">45</span>, <span style="color: #ff0000;">54</span>, <span style="color: #ff0000;">81</span>, <span style="color: #ff0000;">30</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> b
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">27</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">5</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">39</span>, <span style="color: #ff0000;">43</span>, <span style="color: #ff0000;">60</span>, <span style="color: #ff0000;">79</span>, <span style="color: #ff0000;">19</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span>
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">20</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">5</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">23</span>, <span style="color: #ff0000;">26</span>, <span style="color: #ff0000;">33</span>, <span style="color: #ff0000;">78</span>, <span style="color: #ff0000;">14</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> d
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">27</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">5</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">21</span>, <span style="color: #ff0000;">31</span>, <span style="color: #ff0000;">53</span>, <span style="color: #ff0000;">71</span>, <span style="color: #ff0000;">11</span>
&nbsp;
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Take a sample of 15 from each group, with replacement, and a seed of 1</span>
<span style="color: #080;">&gt;</span> stratified<span style="color: #080;">&#40;</span>z, <span style="color: #ff0000;">15</span>, <span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">replace</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> a
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">26</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">15</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">38</span>, <span style="color: #ff0000;">45</span>, <span style="color: #ff0000;">56</span>, <span style="color: #ff0000;">91</span>, <span style="color: #ff0000;">35</span>, <span style="color: #ff0000;">91</span>, <span style="color: #ff0000;">96</span>, <span style="color: #ff0000;">74</span>, <span style="color: #ff0000;">62</span>, <span style="color: #ff0000;">15</span>, <span style="color: #ff0000;">35</span>, <span style="color: #ff0000;">30</span>, <span style="color: #ff0000;">74</span>, <span style="color: #ff0000;">45</span>, <span style="color: #ff0000;">81</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> b
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">27</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">15</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">39</span>, <span style="color: #ff0000;">44</span>, <span style="color: #ff0000;">63</span>, <span style="color: #ff0000;">93</span>, <span style="color: #ff0000;">29</span>, <span style="color: #ff0000;">93</span>, <span style="color: #ff0000;">95</span>, <span style="color: #ff0000;">66</span>, <span style="color: #ff0000;">64</span>, <span style="color: #ff0000;">3</span>, <span style="color: #ff0000;">29</span>, <span style="color: #ff0000;">19</span>, <span style="color: #ff0000;">70</span>, <span style="color: #ff0000;">44</span>, <span style="color: #ff0000;">77</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span>
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">20</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">15</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">23</span>, <span style="color: #ff0000;">26</span>, <span style="color: #ff0000;">55</span>, <span style="color: #ff0000;">94</span>, <span style="color: #ff0000;">22</span>, <span style="color: #ff0000;">92</span>, <span style="color: #ff0000;">94</span>, <span style="color: #ff0000;">72</span>, <span style="color: #ff0000;">61</span>, <span style="color: #ff0000;">9</span>, <span style="color: #ff0000;">22</span>, <span style="color: #ff0000;">14</span>, <span style="color: #ff0000;">72</span>, <span style="color: #ff0000;">26</span>, <span style="color: #ff0000;">78</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> d
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">27</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">15</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">21</span>, <span style="color: #ff0000;">32</span>, <span style="color: #ff0000;">58</span>, <span style="color: #ff0000;">88</span>, <span style="color: #ff0000;">16</span>, <span style="color: #ff0000;">88</span>, <span style="color: #ff0000;">89</span>, <span style="color: #ff0000;">65</span>, <span style="color: #ff0000;">59</span>, <span style="color: #ff0000;">4</span>, <span style="color: #ff0000;">16</span>, <span style="color: #ff0000;">11</span>, <span style="color: #ff0000;">67</span>, <span style="color: #ff0000;">32</span>, <span style="color: #ff0000;">69</span>
&nbsp;
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Take a sample of 10% from each group, using a seed of 1,</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># and display the output as a data frame</span>
<span style="color: #080;">&gt;</span> stratified<span style="color: #080;">&#40;</span>z, .1, <span style="color: #ff0000;">1</span>, dframe<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
  Group Samples
<span style="color: #ff0000;">1</span>     a      <span style="color: #ff0000;">38</span>
<span style="color: #ff0000;">2</span>     a      <span style="color: #ff0000;">45</span>
<span style="color: #ff0000;">3</span>     a      <span style="color: #ff0000;">54</span>
  Group Samples
<span style="color: #ff0000;">1</span>     b      <span style="color: #ff0000;">39</span>
<span style="color: #ff0000;">2</span>     b      <span style="color: #ff0000;">43</span>
<span style="color: #ff0000;">3</span>     b      <span style="color: #ff0000;">60</span>
  Group Samples
<span style="color: #ff0000;">1</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">23</span>
<span style="color: #ff0000;">2</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">26</span>
  Group Samples
<span style="color: #ff0000;">1</span>     d      <span style="color: #ff0000;">21</span>
<span style="color: #ff0000;">2</span>     d      <span style="color: #ff0000;">31</span>
<span style="color: #ff0000;">3</span>     d      <span style="color: #ff0000;">53</span></pre></td></tr></table></div><h2>Replicating the results from tapply()</h2><p>I mentioned earlier that the results are different from what you would get if you were to use the <code>tapply()</code> function. However, it is easy to get the same results using this <code>stratified</code> function&#8211;simply move your &#8220;<code>seed</code>&#8221; outside of the function (enter seed as <code>"NO"</code> [with quotes] and instead, use <code>set.seed()</code> as you normally would).</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1141code10'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p114110"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
</pre></td><td
class="code" id="p1141code10"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> <span style="color: #228B22;"># See what tapply() gives us</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span> <span style="color: #0000FF; font-weight: bold;">tapply</span><span style="color: #080;">&#40;</span>z$a, z$b, <span style="color: #0000FF; font-weight: bold;">sample</span>, size <span style="color: #080;">=</span> <span style="color: #ff0000;">4</span><span style="color: #080;">&#41;</span>
$a
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">38</span> <span style="color: #ff0000;">45</span> <span style="color: #ff0000;">54</span> <span style="color: #ff0000;">81</span>
&nbsp;
$b
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">29</span> <span style="color: #ff0000;">86</span> <span style="color: #ff0000;">95</span> <span style="color: #ff0000;">63</span>
&nbsp;
$c
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">61</span>  <span style="color: #ff0000;">9</span> <span style="color: #ff0000;">14</span> <span style="color: #ff0000;">92</span>
&nbsp;
$d
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">67</span> <span style="color: #ff0000;">31</span> <span style="color: #ff0000;">68</span> <span style="color: #ff0000;">34</span>
&nbsp;
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># The normal usage for the stratified function</span>
<span style="color: #080;">&gt;</span> stratified<span style="color: #080;">&#40;</span>z, <span style="color: #ff0000;">4</span>, <span style="color: #ff0000;">1</span>, dframe <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
  Group Samples
<span style="color: #ff0000;">1</span>     a      <span style="color: #ff0000;">38</span>
<span style="color: #ff0000;">2</span>     a      <span style="color: #ff0000;">45</span>
<span style="color: #ff0000;">3</span>     a      <span style="color: #ff0000;">54</span>
<span style="color: #ff0000;">4</span>     a      <span style="color: #ff0000;">81</span>
  Group Samples
<span style="color: #ff0000;">1</span>     b      <span style="color: #ff0000;">39</span>
<span style="color: #ff0000;">2</span>     b      <span style="color: #ff0000;">43</span>
<span style="color: #ff0000;">3</span>     b      <span style="color: #ff0000;">60</span>
<span style="color: #ff0000;">4</span>     b      <span style="color: #ff0000;">79</span>
  Group Samples
<span style="color: #ff0000;">1</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">23</span>
<span style="color: #ff0000;">2</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">26</span>
<span style="color: #ff0000;">3</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">33</span>
<span style="color: #ff0000;">4</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">78</span>
  Group Samples
<span style="color: #ff0000;">1</span>     d      <span style="color: #ff0000;">21</span>
<span style="color: #ff0000;">2</span>     d      <span style="color: #ff0000;">31</span>
<span style="color: #ff0000;">3</span>     d      <span style="color: #ff0000;">53</span>
<span style="color: #ff0000;">4</span>     d      <span style="color: #ff0000;">71</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Getting the same results as tapply()</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Set the seed before using the function,</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># and set the seed for the function as &quot;NO&quot;</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span> stratified<span style="color: #080;">&#40;</span>z, <span style="color: #ff0000;">4</span>, <span style="color: #ff0000;">&quot;NO&quot;</span>, dframe <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
  Group Samples
<span style="color: #ff0000;">1</span>     a      <span style="color: #ff0000;">38</span>
<span style="color: #ff0000;">2</span>     a      <span style="color: #ff0000;">45</span>
<span style="color: #ff0000;">3</span>     a      <span style="color: #ff0000;">54</span>
<span style="color: #ff0000;">4</span>     a      <span style="color: #ff0000;">81</span>
  Group Samples
<span style="color: #ff0000;">1</span>     b      <span style="color: #ff0000;">29</span>
<span style="color: #ff0000;">2</span>     b      <span style="color: #ff0000;">86</span>
<span style="color: #ff0000;">3</span>     b      <span style="color: #ff0000;">95</span>
<span style="color: #ff0000;">4</span>     b      <span style="color: #ff0000;">63</span>
  Group Samples
<span style="color: #ff0000;">1</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">61</span>
<span style="color: #ff0000;">2</span>     <span style="color: #0000FF; font-weight: bold;">c</span>       <span style="color: #ff0000;">9</span>
<span style="color: #ff0000;">3</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">14</span>
<span style="color: #ff0000;">4</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">92</span>
  Group Samples
<span style="color: #ff0000;">1</span>     d      <span style="color: #ff0000;">67</span>
<span style="color: #ff0000;">2</span>     d      <span style="color: #ff0000;">31</span>
<span style="color: #ff0000;">3</span>     d      <span style="color: #ff0000;">68</span>
<span style="color: #ff0000;">4</span>     d      <span style="color: #ff0000;">34</span></pre></td></tr></table></div><h2>The unfortunate&#8230;</h2><p>There are some advantages to each of the output formats. I&#8217;ve set up the list to be quite verbose, which is useful with the proportionate sampling since it shows us how many samples have been taken from each group. The data frame output format, on the other hand, is quite compact.</p><p>What I still need to figure out, though, is why R won&#8217;t store my output. I suspect that it has something to do with how my loops are set up. I assume that somewhere, I need to add something like an rbind command.</p><p>When the time is right, I will be sure to post what I&#8217;ve found.</p> ]]></content:encoded> <wfw:commentRss>http://news.mrdwab.com/2011/05/15/stratified-random-sampling-in-r-beta/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Sounds interesting. Is that a regular expression?</title><link>http://news.mrdwab.com/2011/03/07/sounds-interesting-is-that-a-regular-expression/</link> <comments>http://news.mrdwab.com/2011/03/07/sounds-interesting-is-that-a-regular-expression/#comments</comments> <pubDate>Mon, 07 Mar 2011 05:44:09 +0000</pubDate> <dc:creator>Ananda</dc:creator> <category><![CDATA[(all categories)]]></category> <category><![CDATA[Geekiness]]></category> <category><![CDATA[Useless Knowledge]]></category> <category><![CDATA[R]]></category> <category><![CDATA[regular expressions]]></category> <category><![CDATA[tutorial]]></category> <guid
isPermaLink="false">http://news.mrdwab.com/?p=884</guid> <description><![CDATA[I&#8217;ve been meaning to learn how to use regular expressions for quite some time now, but just never seemed to get around to doing so. The other night, I decided to take a stab at them though, and over the past few days, I&#8217;ve sort of managed to learn a few tricks. Some of these [...]]]></description> <content:encoded><![CDATA[<p>I&#8217;ve been meaning to learn how to use regular expressions for quite some time now, but just never seemed to get around to doing so. The other night, I decided to take a stab at them though, and over the past few days, I&#8217;ve sort of managed to learn a few tricks. Some of these might seem unnecessary, particularly since the examples comprise relatively small chunks of text. But, hopefully you can also see the application of the same techniques for larger text files. In some of the examples, I&#8217;ve also included how it might help with preparing your data for use with a program like R. For all of these examples, I&#8217;ve used Geany as my text editor. I suggest you use a good text editor like <a
href="http://www.geany.org/" target="_blank">Geany</a> or <a
href="http://notepad-plus-plus.org/" target="_blank">Notepad++</a> too.</p><p><span
id="more-884"></span></p><h2>Example 1: Changing numeric date formats</h2><p>Imagine we&#8217;re given a file containing dates in the form of m(m)/d(d)/yyyy and someone gives us totally arbitrary instructions to change it to Y:yyyy (tab) M:mm (tab) D:dd (really, I can&#8217;t tell you who or why.)</p><p>Below is our starting text. Note that some of the months and days have only one digit, while some single digit dates are entered with a preceding zero.</p><pre>
09/05/1978
12/11/2003
11/9/2010
3/13/2001
</pre><p>My solution:</p><table
width="100%" cellspacing="12px"><tr><th>Search for this</th><th>Replace with this</th><th
width="250px">Why?</th></tr><tr><td><code>^([0-9])/</code></td><td><code>0\1/</code></td><td
width="250px">To find single digit months and fill with a preceding zero.</td></tr><tr><td><code>/([0-9])/</code></td><td><code>/0\1/</code></td><td
width="250px">To find single digit days and fill with a preceding zero.</td></tr><tr><td><code>^([0-9]+)/([0-9]+)/([0-9]+)</code></td><td><code>Y:\3\tM:\2\tD:\1</code></td><td
width="250px">Separates a date into three sections that we are able to rearrange as we see fit.</td></tr></table><p>The sections in the regular expression search pattern enclosed by parentheses () become &#8220;references&#8221; that we can refer to by their location. In other words, notice that the last regular expression search pattern had three pairs of parentheses. The first one <code>([0-9]+)</code> searches for a number. The <code>+</code> says to keep going until you find the next search item, which is a forward slash. The <code>^</code> before the first item indicates that this should be matched at the start of a line.</p><p>If you followed the instructions correctly, you should get the following:</p><pre>
Y:1978	M:05	D:09
Y:2003	M:11	D:12
Y:2010	M:09	D:11
Y:2001	M:13	D:03
</pre><h2>Example 2: Changing names around</h2><p>The same crazy person that wanted that odd date format also wanted us to change this list of names from being &#8220;First-name Last-name&#8221; to being &#8220;Last-name, First-initial&#8221;. We could do this manually, but why should we? Here&#8217;s what we start with:</p><pre>
Ethan Nakata
Stepanie Foutz
Nikole Pritt
Lesley Ramsay
Lucienne Anderson
Ardith Guo
Kassie Roloff
Kathy Edie
Kellee Rowse
Effie Bensinger
Bethel Gravel
Kathaleen Kovac
Candance Clauss
Sherell Dobrowolski
Kym Thurmon
Xiomara Tocci
Brice Tallon
Natalya Bouldin
Jacki Parise
Evonne Mun
</pre><table
width="100%" cellspacing="12px"><tr><th>Search for this</th><th>Replace with this</th><th
width="250px">Why?</th></tr><tr><td><code>^([A-Z])([a-z]+) ([A-Za-z]+)</code></td><td><code>\3, \1.</code></td><td
width="250px">The first set of parentheses selects just the first initial, the second set, the rest of the first name, the third set, the entire last name. In the replace box, we remove the second reference, and insert a comma and space in between reference three and one, and a period after reference 1.</td></tr></table><p>Here&#8217;s the result:</p><pre>
Nakata, E.
Foutz, S.
Pritt, N.
Ramsay, L.
Anderson, L.
Guo, A.
Roloff, K.
Edie, K.
Rowse, K.
Bensinger, E.
Gravel, B.
Kovac, K.
Clauss, C.
Dobrowolski, S.
Thurmon, K.
Tocci, X.
Tallon, B.
Bouldin, N.
Parise, J.
Mun, E.
</pre><h2>Example 3 &#8211; Import really ugly data cut and pasted from a PDF into R</h2><p>Someone thought that it would be a good idea to copy a part of <a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/WFA_girls_0_5_percentiles.pdf">this table</a> and send it to you <a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/unprocessed-1.txt">as a text file</a> (again, who knows why&#8230;). You want to import the data into R and use it. Can you do it efficiently? You&#8217;re not actually interested in everything. You are most interested in the &#8220;Month&#8221; column and the values in the columns titled &#8220;1st&#8221; to &#8220;99th&#8221;.</p><div
class="wp_codebox_msgheader wp_codebox_hide"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p884code18'); return false;">View Code</a> TXT</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p88418"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
</pre></td><td
class="code" id="p884code18"><pre class="txt" style="font-family:monospace;">Year: Month
Month
L
M
S
1st
3rd
5th
15th
25th
50th
75th
85th
95th
97th
99th
0: 0
0
0.3809
3.2322
0.14171
2.3
2.4
2.5
2.8
2.9
3.2
3.6
3.7
4.0
4.2
4.4
0: 1
1
0.1714
4.1873
0.13724
3.0
3.2
3.3
3.6
3.8
4.2
4.6
4.8
5.2
5.4
5.7
0: 2
2
0.0962
5.1282
0.13000
3.8
4.0
4.1
4.5
4.7
5.1
5.6
5.9
6.3
6.5
6.9
0: 3
3
0.0402
5.8458
0.12619
4.4
4.6
4.7
5.1
5.4
5.8
6.4
6.7
7.2
7.4
7.8
0: 4
4
-0.0050
6.4237
0.12402
4.8
5.1
5.2
5.6
5.9
6.4
7.0
7.3
7.9
8.1
8.6
0: 5
5
-0.0430
6.8985
0.12274
5.2
5.5
5.6
6.1
6.4
6.9
7.5
7.8
8.4
8.7
9.2
0: 6
6
-0.0756
7.2970
0.12204
5.5
5.8
6.0
6.4
6.7
7.3
7.9
8.3
8.9
9.2
9.7
0: 7
7
-0.1039
7.6422
0.12178
5.8
6.1
6.3
6.7
7.0
7.6
8.3
8.7
9.4
9.6
10.2
0: 8
8
-0.1288
7.9487
0.12181
6.0
6.3
6.5
7.0
7.3
7.9
8.6
9.0
9.7
10.0
10.6
0: 9
9
-0.1507
8.2254
0.12199
6.2
6.6
6.8
7.3
7.6
8.2
8.9
9.3
10.1
10.4
11.0
0:10
10
-0.1700
8.4800
0.12223
6.4
6.8
7.0
7.5
7.8
8.5
9.2
9.6
10.4
10.7
11.3
0:11
11
-0.1872
8.7192
0.12247
6.6
7.0
7.2
7.7
8.0
8.7
9.5
9.9
10.7
11.0
11.7
1: 0
12
-0.2024
8.9481
0.12268
6.8
7.1
7.3
7.9
8.2
8.9
9.7
10.2
11.0
11.3
12.0</pre></td></tr></table></div><p>This can actually be done with two regular expression statements and two or three lines of code in R. First, the regular expressions:</p><table
width="100%" cellspacing="12px"><tr><th>Search for this</th><th>Replace with this</th><th
width="250px">Why?</th></tr><tr><td><code>^([01]:[ |0-9]+)</code></td><td>Nothing</td><td
width="250px">This removes the Year: Month column.</td></tr><tr><td><code>^([0-9]|[0-9-]+)\.([0-9]{4,5})</code></td><td>Nothing</td><td
width="250px">This removes the numeric values for the &#8220;L&#8221;, &#8220;M&#8221;, and &#8220;S&#8221; columns.</td></tr><tr><td><code>^([A-Z]{1})\r\n</code></td><td>Nothing</td><td
width="250px">This removes the actual text &#8220;L&#8221;, &#8220;M&#8221;, and &#8220;S&#8221; from lines 3-5 of the unprocessed file.</td></tr></table><p>If you did this correctly, you should end up with a text file like <a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/unprocessed-1b.txt">the one linked here</a>.</p><div
class="wp_codebox_msgheader wp_codebox_hide"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p884code19'); return false;">View Code</a> TXT</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p88419"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
</pre></td><td
class="code" id="p884code19"><pre class="txt" style="font-family:monospace;">Year: Month
Month
1st
3rd
5th
15th
25th
50th
75th
85th
95th
97th
99th
&nbsp;
0
&nbsp;
&nbsp;
&nbsp;
2.3
2.4
2.5
2.8
2.9
3.2
3.6
3.7
4.0
4.2
4.4
&nbsp;
1
&nbsp;
&nbsp;
&nbsp;
3.0
3.2
3.3
3.6
3.8
4.2
4.6
4.8
5.2
5.4
5.7
&nbsp;
2
&nbsp;
&nbsp;
&nbsp;
3.8
4.0
4.1
4.5
4.7
5.1
5.6
5.9
6.3
6.5
6.9
&nbsp;
3
&nbsp;
&nbsp;
&nbsp;
4.4
4.6
4.7
5.1
5.4
5.8
6.4
6.7
7.2
7.4
7.8
&nbsp;
4
&nbsp;
&nbsp;
&nbsp;
4.8
5.1
5.2
5.6
5.9
6.4
7.0
7.3
7.9
8.1
8.6
&nbsp;
5
&nbsp;
&nbsp;
&nbsp;
5.2
5.5
5.6
6.1
6.4
6.9
7.5
7.8
8.4
8.7
9.2
&nbsp;
6
&nbsp;
&nbsp;
&nbsp;
5.5
5.8
6.0
6.4
6.7
7.3
7.9
8.3
8.9
9.2
9.7
&nbsp;
7
&nbsp;
&nbsp;
&nbsp;
5.8
6.1
6.3
6.7
7.0
7.6
8.3
8.7
9.4
9.6
10.2
&nbsp;
8
&nbsp;
&nbsp;
&nbsp;
6.0
6.3
6.5
7.0
7.3
7.9
8.6
9.0
9.7
10.0
10.6
&nbsp;
9
&nbsp;
&nbsp;
&nbsp;
6.2
6.6
6.8
7.3
7.6
8.2
8.9
9.3
10.1
10.4
11.0
&nbsp;
10
&nbsp;
&nbsp;
&nbsp;
6.4
6.8
7.0
7.5
7.8
8.5
9.2
9.6
10.4
10.7
11.3
&nbsp;
11
&nbsp;
&nbsp;
&nbsp;
6.6
7.0
7.2
7.7
8.0
8.7
9.5
9.9
10.7
11.0
11.7
&nbsp;
12
&nbsp;
&nbsp;
&nbsp;
6.8
7.1
7.3
7.9
8.2
8.9
9.7
10.2
11.0
11.3
12.0</pre></td></tr></table></div><p>Do a &#8220;select all&#8221; and switch over to R. We&#8217;re going to first create a matrix with the values from line 15 to the end, and then create the column names for the matrix from lines 2 through 13. Do the following magic:</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p884code20'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p88420"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td
class="code" id="p884code20"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> birthweight.<span style="">percentiles</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">matrix</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">scan</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;clipboard&quot;</span>, skip<span style="color: #080;">=</span><span style="color: #ff0000;">14</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">ncol</span><span style="color: #080;">=</span><span style="color: #ff0000;">12</span>, byrow<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
Read <span style="color: #ff0000;">156</span> items
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">colnames</span><span style="color: #080;">&#40;</span>birthweight.<span style="">percentiles</span><span style="color: #080;">&#41;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">scan</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;clipboard&quot;</span>, what<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;character&quot;</span>,
<span style="color: #080;">+</span>                                          skip<span style="color: #080;">=</span><span style="color: #ff0000;">1</span>, n<span style="color: #080;">=</span><span style="color: #ff0000;">12</span><span style="color: #080;">&#41;</span>
Read <span style="color: #ff0000;">12</span> items
<span style="color: #080;">&gt;</span> birthweight.<span style="">percentiles</span>
      Month 1st 3rd 5th 15th 25th 50th 75th 85th 95th 97th 99th
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">0</span> <span style="color: #ff0000;">2.3</span> <span style="color: #ff0000;">2.4</span> <span style="color: #ff0000;">2.5</span>  <span style="color: #ff0000;">2.8</span>  <span style="color: #ff0000;">2.9</span>  <span style="color: #ff0000;">3.2</span>  <span style="color: #ff0000;">3.6</span>  <span style="color: #ff0000;">3.7</span>  <span style="color: #ff0000;">4.0</span>  <span style="color: #ff0000;">4.2</span>  <span style="color: #ff0000;">4.4</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">2</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">3.0</span> <span style="color: #ff0000;">3.2</span> <span style="color: #ff0000;">3.3</span>  <span style="color: #ff0000;">3.6</span>  <span style="color: #ff0000;">3.8</span>  <span style="color: #ff0000;">4.2</span>  <span style="color: #ff0000;">4.6</span>  <span style="color: #ff0000;">4.8</span>  <span style="color: #ff0000;">5.2</span>  <span style="color: #ff0000;">5.4</span>  <span style="color: #ff0000;">5.7</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">3</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">3.8</span> <span style="color: #ff0000;">4.0</span> <span style="color: #ff0000;">4.1</span>  <span style="color: #ff0000;">4.5</span>  <span style="color: #ff0000;">4.7</span>  <span style="color: #ff0000;">5.1</span>  <span style="color: #ff0000;">5.6</span>  <span style="color: #ff0000;">5.9</span>  <span style="color: #ff0000;">6.3</span>  <span style="color: #ff0000;">6.5</span>  <span style="color: #ff0000;">6.9</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">4</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">3</span> <span style="color: #ff0000;">4.4</span> <span style="color: #ff0000;">4.6</span> <span style="color: #ff0000;">4.7</span>  <span style="color: #ff0000;">5.1</span>  <span style="color: #ff0000;">5.4</span>  <span style="color: #ff0000;">5.8</span>  <span style="color: #ff0000;">6.4</span>  <span style="color: #ff0000;">6.7</span>  <span style="color: #ff0000;">7.2</span>  <span style="color: #ff0000;">7.4</span>  <span style="color: #ff0000;">7.8</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">5</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">4</span> <span style="color: #ff0000;">4.8</span> <span style="color: #ff0000;">5.1</span> <span style="color: #ff0000;">5.2</span>  <span style="color: #ff0000;">5.6</span>  <span style="color: #ff0000;">5.9</span>  <span style="color: #ff0000;">6.4</span>  <span style="color: #ff0000;">7.0</span>  <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.9</span>  <span style="color: #ff0000;">8.1</span>  <span style="color: #ff0000;">8.6</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">6</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">5</span> <span style="color: #ff0000;">5.2</span> <span style="color: #ff0000;">5.5</span> <span style="color: #ff0000;">5.6</span>  <span style="color: #ff0000;">6.1</span>  <span style="color: #ff0000;">6.4</span>  <span style="color: #ff0000;">6.9</span>  <span style="color: #ff0000;">7.5</span>  <span style="color: #ff0000;">7.8</span>  <span style="color: #ff0000;">8.4</span>  <span style="color: #ff0000;">8.7</span>  <span style="color: #ff0000;">9.2</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">7</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">6</span> <span style="color: #ff0000;">5.5</span> <span style="color: #ff0000;">5.8</span> <span style="color: #ff0000;">6.0</span>  <span style="color: #ff0000;">6.4</span>  <span style="color: #ff0000;">6.7</span>  <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.9</span>  <span style="color: #ff0000;">8.3</span>  <span style="color: #ff0000;">8.9</span>  <span style="color: #ff0000;">9.2</span>  <span style="color: #ff0000;">9.7</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">8</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">7</span> <span style="color: #ff0000;">5.8</span> <span style="color: #ff0000;">6.1</span> <span style="color: #ff0000;">6.3</span>  <span style="color: #ff0000;">6.7</span>  <span style="color: #ff0000;">7.0</span>  <span style="color: #ff0000;">7.6</span>  <span style="color: #ff0000;">8.3</span>  <span style="color: #ff0000;">8.7</span>  <span style="color: #ff0000;">9.4</span>  <span style="color: #ff0000;">9.6</span> <span style="color: #ff0000;">10.2</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">9</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">8</span> <span style="color: #ff0000;">6.0</span> <span style="color: #ff0000;">6.3</span> <span style="color: #ff0000;">6.5</span>  <span style="color: #ff0000;">7.0</span>  <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.9</span>  <span style="color: #ff0000;">8.6</span>  <span style="color: #ff0000;">9.0</span>  <span style="color: #ff0000;">9.7</span> <span style="color: #ff0000;">10.0</span> <span style="color: #ff0000;">10.6</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">10</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">9</span> <span style="color: #ff0000;">6.2</span> <span style="color: #ff0000;">6.6</span> <span style="color: #ff0000;">6.8</span>  <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.6</span>  <span style="color: #ff0000;">8.2</span>  <span style="color: #ff0000;">8.9</span>  <span style="color: #ff0000;">9.3</span> <span style="color: #ff0000;">10.1</span> <span style="color: #ff0000;">10.4</span> <span style="color: #ff0000;">11.0</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">11</span>,<span style="color: #080;">&#93;</span>    <span style="color: #ff0000;">10</span> <span style="color: #ff0000;">6.4</span> <span style="color: #ff0000;">6.8</span> <span style="color: #ff0000;">7.0</span>  <span style="color: #ff0000;">7.5</span>  <span style="color: #ff0000;">7.8</span>  <span style="color: #ff0000;">8.5</span>  <span style="color: #ff0000;">9.2</span>  <span style="color: #ff0000;">9.6</span> <span style="color: #ff0000;">10.4</span> <span style="color: #ff0000;">10.7</span> <span style="color: #ff0000;">11.3</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">12</span>,<span style="color: #080;">&#93;</span>    <span style="color: #ff0000;">11</span> <span style="color: #ff0000;">6.6</span> <span style="color: #ff0000;">7.0</span> <span style="color: #ff0000;">7.2</span>  <span style="color: #ff0000;">7.7</span>  <span style="color: #ff0000;">8.0</span>  <span style="color: #ff0000;">8.7</span>  <span style="color: #ff0000;">9.5</span>  <span style="color: #ff0000;">9.9</span> <span style="color: #ff0000;">10.7</span> <span style="color: #ff0000;">11.0</span> <span style="color: #ff0000;">11.7</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">13</span>,<span style="color: #080;">&#93;</span>    <span style="color: #ff0000;">12</span> <span style="color: #ff0000;">6.8</span> <span style="color: #ff0000;">7.1</span> <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.9</span>  <span style="color: #ff0000;">8.2</span>  <span style="color: #ff0000;">8.9</span>  <span style="color: #ff0000;">9.7</span> <span style="color: #ff0000;">10.2</span> <span style="color: #ff0000;">11.0</span> <span style="color: #ff0000;">11.3</span> <span style="color: #ff0000;">12.0</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Optional: If you need or prefer a data frame instead of a matrix, run:</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># birthweight.percentiles = as.data.frame(birthweight.percentiles)</span></pre></td></tr></table></div><p>The first line of R code scans in the data values and fills them into a matrix with 12 columns, filling by row from left to right. The second line adds the column names. Notice how we were able to use &#8220;skip&#8221; and &#8220;n&#8221; to select the values we were interested in at each stage.</p><h2>Example 4 &#8211; WordPress.com&#8217;s monthly stats table is ugly</h2><p>Really it is! Here&#8217;s a screenshot. It looks pretty, right?</p><div
id="attachment_905" class="wp-caption aligncenter" style="width: 410px"><a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/WordPress-Stats.png" rel="lightbox[884]"><img
src="http://news.mrdwab.com/wp-content/uploads/2011/03/WordPress-Stats-400x218.png" alt="" title="WordPress Stats" width="400" height="218" class="size-medium wp-image-905" /></a><p
class="wp-caption-text">WordPress.com&#039;s monthly stats view. Pretty online, but not easy to copy and paste into other applications.</p></div><p>But, if you copy it into a an Excel spreadsheet, you get this:</p><p><iframe
width='500' height='300' frameborder='0' src='https://spreadsheets.google.com/pub?hl=en&#038;hl=en&#038;key=0An2f7Ho_4e0fdEFVN3Ezc1h6V2hQNm5pLU1mQlRMQWc&#038;single=true&#038;gid=0&#038;output=html&#038;widget=true'></iframe></p><p>Ugh. How are we supposed to work with this?</p><p>Well, it depends on if you copied from Excel to your text editor, or if you copied directly from the WordPress stats screen to a text editor. I&#8217;ll cover both scenarios below.</p><p><strong><em>Copying from WordPress to Excel to your text editor</em></strong></p><p>If you copied from WordPress to Excel to your text editor, you&#8217;d end up with a text file <a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/unprocessed-2.txt">like the one linked here</a>.</p><div
class="wp_codebox_msgheader wp_codebox_hide"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p884code21'); return false;">View Code</a> TXT</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p88421"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
</pre></td><td
class="code" id="p884code21"><pre class="txt" style="font-family:monospace;">24-Jan	25-Jan	26-Jan	27-Jan	28-Jan	29-Jan	30-Jan	201	29
&nbsp;
&nbsp;
19	27	14	32	73	25	11
&nbsp;
31-Jan	1-Feb	2-Feb	3-Feb	4-Feb	5-Feb	6-Feb	302	43	50.25%
&nbsp;
&nbsp;
23	15	63	49	52	29	71
&nbsp;
7-Feb	8-Feb	9-Feb	10-Feb	11-Feb	12-Feb	13-Feb	369	53	22.19%
&nbsp;
&nbsp;
59	35	24	88	29	96	38
&nbsp;
14-Feb	15-Feb	16-Feb	17-Feb	18-Feb	19-Feb	20-Feb	376	54	1.90%
&nbsp;
&nbsp;
115	96	60	41	15	24	25
&nbsp;
21-Feb	22-Feb	23-Feb	24-Feb	25-Feb	26-Feb	27-Feb	291	42	-22.61%
&nbsp;
&nbsp;
51	29	57	23	79	21	31
&nbsp;
28-Feb	1-Mar	2-Mar	3-Mar	4-Mar			187	40	-3.18%
&nbsp;
&nbsp;
25	54	36	46	26</pre></td></tr></table></div><p>This is easy to clean up and import into R. You can simply search for <code>^([0-9]{1,2}-[A-Z].*)</code>, replace it with nothing, copy what&#8217;s left and read it into R by using something like <code>&gt; data = scan("clipboard")</code>.</p><p>Notice that we sacrifice the dates here though.</p><p><strong><em>Copying from WordPress to your text editor</em></strong></p><p>If you copied from WordPress to your text editor, you&#8217;d end up with a text file <a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/unprocessed-3.txt">like the one linked here</a>.</p><div
class="wp_codebox_msgheader wp_codebox_hide"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p884code22'); return false;">View Code</a> TXT</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p88422"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
</pre></td><td
class="code" id="p884code22"><pre class="txt" style="font-family:monospace;">Jan 24
&nbsp;
19
Jan 25
&nbsp;
27
Jan 26
&nbsp;
14
Jan 27
&nbsp;
32
Jan 28
&nbsp;
73
Jan 29
&nbsp;
25
Jan 30
&nbsp;
11
201	29
Jan 31
&nbsp;
23
Feb 1
&nbsp;
15
Feb 2
&nbsp;
63
Feb 3
&nbsp;
49
Feb 4
&nbsp;
52
Feb 5
&nbsp;
29
Feb 6
&nbsp;
71
302	43	+50.25%
Feb 7
&nbsp;
59
Feb 8
&nbsp;
35
Feb 9
&nbsp;
24
Feb 10
&nbsp;
88
Feb 11
&nbsp;
29
Feb 12
&nbsp;
96
Feb 13
&nbsp;
38
369	53	+22.19%
Feb 14
&nbsp;
115
Feb 15
&nbsp;
96
Feb 16
&nbsp;
60
Feb 17
&nbsp;
41
Feb 18
&nbsp;
15
Feb 19
&nbsp;
24
Feb 20
&nbsp;
25
376	54	+1.90%
Feb 21
&nbsp;
51
Feb 22
&nbsp;
29
Feb 23
&nbsp;
57
Feb 24
&nbsp;
23
Feb 25
&nbsp;
79
Feb 26
&nbsp;
21
Feb 27
&nbsp;
31
291	42	-22.61%
Feb 28
&nbsp;
25
Mar 1
&nbsp;
54
Mar 2
&nbsp;
36
Mar 3
&nbsp;
46
Mar 4
&nbsp;
26
187	40	-3.18%</pre></td></tr></table></div><p>From this file, we can either search and replace in a way that we retain the dates or we drop them and keep just the stats.</p><p><strong><em><u>Keep the dates</u></em></strong></p><table
width="100%" cellspacing="12px"><tr><th>Search for this</th><th>Replace with this</th><th
width="250px">Why?</th></tr><tr><td><code>\r\n\r\n</code></td><td><code>\t</code></td><td
width="250px">This isn&#8217;t actually with regular expressions, but with &#8220;escape sequences&#8221;. We&#8217;re replacing two line spaces with a tab. This will result in a format like &#8220;Date tab number-of-visits&#8221;. It will also make it easier for us to do the next step of search and replace.</td></tr><tr><td><code>^([0-9].*)</code></td><td>Nothing</td><td
width="250px">In our previous step, we ended up with a convenient scenario where the lines we&#8217;re interested in start with a character, and all the other lines start with a number. We can now easily remove those lines.</td></tr></table><p>You can now easily copy this to your clipboard and read this into R using <code>&gt; data = read.table("clipboard", header=F)</code>. Don&#8217;t forget to add <code>header=F</code> or else R will think the first line is the column names.</p><p><strong><em>Just gimme the data</em></strong></p><p>Do everything you did before, but add one more regular expression search and replace. Search for <code>^([A-Z].*)\t(.*)</code> and replace it with <code>\2</code>. This will create a list of just the data that can be read into R by using something like <code>&gt; data = scan("clipboard")</code>.</p><h2>Example 5 &#8211; Merged cells are a pain&#8230;.</h2><p>Your friend gives you <a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/REGEX-TEST.pdf">this PDF</a> with a beautiful table in it, and you need to extract the data. When you copy it into Microsoft Word, OpenOffice Writer, or whatever you prefer, it looks terrible, like the unformatted text below:</p><pre>
1. Organization M (117) 1 Andhra Pradesh 7
2 Arunachal Pradesh 8
3 Assam 8
4 Bihar 24
5 Chattisgarh 2
6 Goa 15
7 Gujarat 19
8 Haryana 4
9 Himachal Pradesh 14
10 Jammu and Kashmir 2
11 Jharkhand 2
12 Karnataka 4
13 Kerala 2
14 Madhya Pradesh 2
15 Maharashtra 2
16 Manipur 2
2. Foundation X (69) 17 Meghalaya 29
18 Mizoram 10
19 Nagaland 4
20 Odisha 12
21 Puducherry 14
3. NGO Z (8) 22 Punjab 8
4. Government (16) 23 Rajasthan 16
5. Research Institute A (8) 24 Sikkim 4
25 Tamil Nadu 4
6. Organization C (36) 26 Tripura 8
27 Uttar Pradesh 15
28 Uttarakhand 1
29 West Bengal 12
</pre><p>This is actually also pretty easy to fix with the following set of regular expressions.</p><table
width="100%" cellspacing="12px"><tr><th>Search for this</th><th>Replace with this</th><th
width="250px">Why?</th></tr><tr><td><code>^([0-9]\. )(.*)\(([0-9]+)\)( )</code></td><td><code>\t\2\t\r\n</code></td><td
width="250px">We&#8217;re trying to match just the name of the organization and put each organization on a line by itself, preceded and followed by a tab.</td></tr><tr><td><code>^([0-9]+) (.*) (.*)</code></td><td><code>\1\t\2\t\3</code></td><td
width="250px">We want to break up the state information into three parts and insert a tab to indicate each column.</td></tr></table><p>After doing these two steps, your data should now look like this:</p><pre>
	Organization M
1	Andhra Pradesh	7
2	Arunachal Pradesh	8
3	Assam	8
4	Bihar	24
5	Chattisgarh	2
6	Goa	15
7	Gujarat	19
8	Haryana	4
9	Himachal Pradesh	14
10	Jammu and Kashmir	2
11	Jharkhand	2
12	Karnataka	4
13	Kerala	2
14	Madhya Pradesh	2
15	Maharashtra	2
16	Manipur	2
	Foundation X
17	Meghalaya	29
18	Mizoram	10
19	Nagaland	4
20	Odisha	12
21	Puducherry	14
	NGO Z
22	Punjab	8
	Government
23	Rajasthan	16
	Research Institute A
24	Sikkim	4
25	Tamil Nadu	4
	Organization C
26	Tripura	8
27	Uttar Pradesh	15
28	Uttarakhand	1
29	West Bengal	12
</pre><p>This is now a tab delimited file, so it can easily be imported into any program that uses such files, or can be pasted into a word processor and converted into a table by using the &#8220;text to table&#8221; feature that any decent word processor should have.</p><h2>Bonus session</h2><p>Imagine (for whatever reason) you wanted the table from Example 5 to be used in R. Here&#8217;s how you can do it. First, count how many sites each organization is working in and make a note of that. In this case, it&#8217;s <code>16, 5, 1, 1, 2, 4</code>. Then, do the following:</p><table
width="100%" cellspacing="12px"><tr><th>Search for this</th><th>Replace with this</th><th
width="250px">Why?</th></tr><tr><td><code>^([0-9]\. )(.*) \(.*</code></td><td><code>"\2"</code></td><td
width="250px">We want just the organization&#8217;s name, and this is the first step to help us with that. Be sure to put the replace reference in quotes.</td></tr><tr><td><code>^[0-9].*</code></td><td>Nothing</td><td
width="250px">We now have just the names of each organization (with a lot of blank lines in between).</td></tr></table><p>Copy whatever is in your text editor, switch over to R and enter the following:</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p884code23'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p88423"><td
class="line_numbers"><pre>1
</pre></td><td
class="code" id="p884code23"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> orgs <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">rep</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">scan</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;clipboard&quot;</span>, what<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;character&quot;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">16</span>, <span style="color: #ff0000;">5</span>, <span style="color: #ff0000;">1</span>, <span style="color: #ff0000;">1</span>, <span style="color: #ff0000;">2</span>, <span style="color: #ff0000;">4</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div><p>Switch back to your text editor, and revert the file to its original state (just do a couple of undos). Then, do what you did in Example 5 except:</p><ul><li>In the first step, instead of replacing with <code>\t\2\t\r\n</code>, replace with nothing.</li><li>In the next step, instead of replacing with <code>\1\t\2\t\3</code>, replace with <code>\2\t\3</code>.</li></ul><p> Again, copy whatever&#8217;s in your text file, switch back over to R, and do the following:</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p884code24'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p88424"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
</pre></td><td
class="code" id="p884code24"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> organizations <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">read.<span style="">delim</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;clipboard&quot;</span>, header<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span>orgs, organizations<span style="color: #080;">&#41;</span>
                   orgs                V1 V2
<span style="color: #ff0000;">1</span>        Organization M    Andhra Pradesh  <span style="color: #ff0000;">7</span>
<span style="color: #ff0000;">2</span>        Organization M Arunachal Pradesh  <span style="color: #ff0000;">8</span>
<span style="color: #ff0000;">3</span>        Organization M             Assam  <span style="color: #ff0000;">8</span>
<span style="color: #ff0000;">4</span>        Organization M             Bihar <span style="color: #ff0000;">24</span>
<span style="color: #ff0000;">5</span>        Organization M       Chattisgarh  <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">6</span>        Organization M               Goa <span style="color: #ff0000;">15</span>
<span style="color: #ff0000;">7</span>        Organization M           Gujarat <span style="color: #ff0000;">19</span>
<span style="color: #ff0000;">8</span>        Organization M           Haryana  <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">9</span>        Organization M  Himachal Pradesh <span style="color: #ff0000;">14</span>
<span style="color: #ff0000;">10</span>       Organization M Jammu and Kashmir  <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">11</span>       Organization M         Jharkhand  <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">12</span>       Organization M         Karnataka  <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">13</span>       Organization M            Kerala  <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">14</span>       Organization M    Madhya Pradesh  <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">15</span>       Organization M       Maharashtra  <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">16</span>       Organization M           Manipur  <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">17</span>         Foundation X         Meghalaya <span style="color: #ff0000;">29</span>
<span style="color: #ff0000;">18</span>         Foundation X           Mizoram <span style="color: #ff0000;">10</span>
<span style="color: #ff0000;">19</span>         Foundation X          Nagaland  <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">20</span>         Foundation X            Odisha <span style="color: #ff0000;">12</span>
<span style="color: #ff0000;">21</span>         Foundation X        Puducherry <span style="color: #ff0000;">14</span>
<span style="color: #ff0000;">22</span>                NGO Z            Punjab  <span style="color: #ff0000;">8</span>
<span style="color: #ff0000;">23</span>           Government         Rajasthan <span style="color: #ff0000;">16</span>
<span style="color: #ff0000;">24</span> Research Institute A            Sikkim  <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">25</span> Research Institute A        Tamil Nadu  <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">26</span>       Organization <span style="color: #0000FF; font-weight: bold;">C</span>           Tripura  <span style="color: #ff0000;">8</span>
<span style="color: #ff0000;">27</span>       Organization <span style="color: #0000FF; font-weight: bold;">C</span>     Uttar Pradesh <span style="color: #ff0000;">15</span>
<span style="color: #ff0000;">28</span>       Organization <span style="color: #0000FF; font-weight: bold;">C</span>       Uttarakhand  <span style="color: #ff0000;">1</span>
<span style="color: #ff0000;">29</span>       Organization <span style="color: #0000FF; font-weight: bold;">C</span>       West Bengal <span style="color: #ff0000;">12</span></pre></td></tr></table></div> ]]></content:encoded> <wfw:commentRss>http://news.mrdwab.com/2011/03/07/sounds-interesting-is-that-a-regular-expression/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Effective Communication?</title><link>http://news.mrdwab.com/2011/02/12/effective-communication/</link> <comments>http://news.mrdwab.com/2011/02/12/effective-communication/#comments</comments> <pubDate>Sat, 12 Feb 2011 16:51:20 +0000</pubDate> <dc:creator>Ananda</dc:creator> <category><![CDATA[(all categories)]]></category> <category><![CDATA[(non) fiction]]></category> <category><![CDATA[India]]></category> <category><![CDATA[Stories]]></category> <category><![CDATA[classroom experiments]]></category> <category><![CDATA[communication]]></category> <category><![CDATA[spectrum]]></category> <category><![CDATA[Tata-Dhan Academy]]></category> <guid
isPermaLink="false">http://news.mrdwab.com/?p=876</guid> <description><![CDATA[When people begin the study of communication, their attitudes vary anywhere from “I think this would be a very important class: it is important to understand the communication process if I want to improve the effectiveness of my communication,” to “What a waste of time. I’ve been communicating all my life. Do I really need [...]]]></description> <content:encoded><![CDATA[<p>When people begin the study of communication, their attitudes vary anywhere from “I think this would be a very important class: it is important to understand the communication process if I want to improve the effectiveness of my communication,” to “What a waste of time. I’ve been communicating all my life. Do I really need to take a course to understand communication?”</p><p>Whether or not we take a course in communication, there is considerable value in trying to refine our understanding of communication. To demonstrate, I will present two class exercises. In describing the exercises, hopefully some of the jargon common in the communications discipline (for example, encoding, decoding, channel, and congruence) will become clearer, and you will be at least a little more sensitive to trying to verify the effectiveness of your everyday communication approaches.</p><p><span
id="more-876"></span></p><h2>Exercise 1: Oral Instructions</h2><p>The first exercise involved oral descriptions of drawings composed of basic geometric shapes. One student (we will call him the “sender”) was shown a simple drawing to describe to his classmates (the “receivers”) who were, in turn, supposed to try to recreate the drawing being described.</p><p>However, there were restrictions. There were encoding restrictions on the person giving the instructions: they could describe the image only by using basic terms relating to shapes, lengths, directions, distances, and so on, and they could not use gestures in their instructions. For instance, in the second source drawing, the sender could not say, “Draw a house sitting upon a mound&#8230;” but would have to say something like “Draw the top half of a circle. Above that semi-circle, draw a square&#8230;” and so on. In the decoding process by the receivers, the option to ask questions of the sender was removed. The channel was handicapped by imposing a five-minute time limit for the descriptions and the drawing.</p><p>As can be seen in Figure 1, the resulting drawings are quite varied, although in some cases (particularly the drawing of the cube with a six-pointed star on its face), they do come close to reproducing the source.<br
/><div
id="attachment_423" class="wp-caption aligncenter" style="width: 416px"><a
href="http://tdapdm.files.wordpress.com/2011/02/classroom-experiments.png" rel="lightbox[876]"><img
src="http://tdapdm.files.wordpress.com/2011/02/classroom-experiments.png" alt="" title="Classroom Experiments" width="406" height="1036" class="size-full wp-image-423" /></a><p
class="wp-caption-text">Figure 1: Three sets of drawings based on oral instructions. The first (outlined) drawing in each set is the source figure that the sender had to give instructions for. The remaining are the interpretation of the instructions by the receivers.</p></div></p><h2>Exercise 2: Written Instructions</h2><p>After a classroom discussion of the outcomes, one of the students strongly felt that things would be quite different if the instructions were given in written form. “By taking time to write the instructions,” he asserted, “we would be able to make sure that the instructions are clear enough that anyone reading it would be able to understand them.”</p><p>We decided to test his theory by writing instructions for drawing a cube with a six-pointed star on one face. Here are some of the instructions written by the students and the resulting drawings from various faculty and staff.</p><h3>Drawing 1</h3><p><a
href="http://tdapdm.files.wordpress.com/2011/02/box-1.png" rel="lightbox[876]"><img
src="http://tdapdm.files.wordpress.com/2011/02/box-1.png?w=150" alt="" title="Box 1" width="150" height="108" class="alignright size-thumbnail wp-image-425" /></a>(1) First, draw a square. (2) In the middle of the square, make a small triangle. (3) Make another triangle also in the middle of the square just opposite to the first triangle, but it is on the first triangle. (4) Above the square, draw a line parallel to the line of the square and join the sides and form a parallelogram. (5) From the left side, make another parallelogram with the help of one side of the square.<br
/> <br
style="clear:both" /></p><h3>Drawing 2</h3><p><a
href="http://tdapdm.files.wordpress.com/2011/02/box-2.png" rel="lightbox[876]"><img
src="http://tdapdm.files.wordpress.com/2011/02/box-2.png?w=150" alt="" title="Box 2" width="150" height="124" class="alignright size-thumbnail wp-image-427" /></a>(1) Draw a square (4 centimetre). (2) Make a six-point star inside square. (3)_Put the name of square ABCD. (4) Take line AB and draw two slanting lines of 1 centimetre from A and B leftward. (5) Give E above A; give F above B. (6) Match F and E. (7) Now take point A and D. Draw a slanting line of 1 centimetre upward. (8)_Give the name G. (9) Match F and G.<br
/> <br
style="clear:both" /></p><h3>Drawing 3</h3><p><a
href="http://tdapdm.files.wordpress.com/2011/02/box-3.png" rel="lightbox[876]"><img
src="http://tdapdm.files.wordpress.com/2011/02/box-3.png?w=150" alt="" title="Box 3" width="150" height="135" class="alignright size-thumbnail wp-image-429" /></a>Please make a diagram with the help of the following instructions. (1) First, make a square (which has all four sides at 90 degree angles). (2) Next, on the upper side and right side of this square, increase a line which will join. After joining, it should look like a cube, but only on upper and right sides. (3) Then, within the box, make a triangle which has 60 degree angles from both side lines; one line should be as a base line. (4) In the next step, make the same triangle but opposite on the first triangle, which should have five points.<br
/> <br
style="clear:both" /></p><h2>What Went Wrong?</h2><p>In the discussions following both exercises, there were a lot of attempts to explain why things did not go as expected. When discussing the experience giving oral instructions, some criticisms included the restrictions, which some argued were somewhat arbitrary. Yet, we deal with such restrictions regularly. Many of us, for instance, rely on text messages (or services like Twitter) for a considerable amount of our communication, and these messages are restricted to less than 200 characters. An organization preparing a television or radio advertisement has to work within a limited time frame, somewhere between 10 and 30 seconds. When writing a classified advertisement or a newspaper advertisement, you are likely to be restricted by the number of square inches of space your advertisement occupies.</p><p>In other instances, the receiver was simply confused. In the second drawing example for written instructions, the receiver got stuck trying to figure out how to draw a six-pointed star: around her paper there were many star scribbles, but they were all the more typical five-pointed star that we would be inclined to draw if someone had asked us to draw a star. In the end, she did not have enough time to complete her drawing. While the sender would have interpreted her message as being clear and direct, the receiver clearly needed some supplementary instruction.</p><p>There were also cases where the instructions were not congruent. In the third drawing example for written instructions, the sender tells us to draw two overlapping triangles, but then tells us that the resulting shape should have five points. Similarly, in the first drawing example, the receiver felt most confused by the fifth instruction. The fourth instruction did not specify which direction the parallelogram should be oriented. After she had already drawn what she interpreted as the fourth instruction, the fifth instruction did not seem to be in congruence with what she had already drawn.</p><p>It is also interesting to note that it is difficult to say whether people were right or wrong in what they drew. Most of the receivers would strongly assert that they drew according to their interpretation of the instructions that they were given. From that perspective, what they drew was correct. Many senders, on the other hand, did acknowledge that they had not tested their message to see whether it conveyed what they intended. A few of them also admitted to “cheating” a little by being close to the receiver and reinforcing the written statements by providing supplementary oral instructions.</p><h2>What Can We Learn From This?</h2><p>Perhaps the most important message from these experiences is that we should not assume that our communication is effective. Rather, we should try to build methods into our communication that try to confirm that the receiver has understood our message. (In teaching parlance, this is commonly referred to as a comprehension check.) Often, these are simple questions or brief activities presented to the audience to verify that their understanding of the message is correct.</p><p>These experiences also indicate that there is no single channel that is best for effective communication. Most likely, a combination of approaches would be most effective since different receivers have different learning or comprehension styles. Imagine, for instance, how much easier it would have been if we could have simply shown the audience the drawing and asked them to reproduce it. Or how helpful it would be if we could also use gestures along with our words to mimic the act of drawing, thus also helping the receiver to create an accurate drawing.</p><p>Keeping these lessons in mind, we should also try to identify what type of communicator we are. For some, their strengths will be in written communication; others might prefer speaking, demonstrating, illustrating, or employing one of the many communication reinforcement tools available to us. Once you have figured that out, practice—both to keep your strengths strong, and to improve upon your weaknesses.</p><p><em>Cross-posted at the <a
href="http://wp.me/pgZki-6N">Tata-Dhan Academy PDM blog</a> and at <a
href="http://ananda.mahto.info/effective-communication/">Ananda Mahto</a>.</em></p> ]]></content:encoded> <wfw:commentRss>http://news.mrdwab.com/2011/02/12/effective-communication/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>
