<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
> <channel><title>2657 Productions News &#187; Geekiness</title> <atom:link href="http://news.mrdwab.com/category/geekiness/feed/" rel="self" type="application/rss+xml" /><link>http://news.mrdwab.com</link> <description>..:: Whereabouts and Whatabouts of the 2657 World ::..</description> <lastBuildDate>Mon, 16 Jan 2012 05:56:16 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <item><title>Stratified random sampling in R from a data frame</title><link>http://news.mrdwab.com/2011/05/20/stratified-random-sampling-in-r-from-a-data-frame/</link> <comments>http://news.mrdwab.com/2011/05/20/stratified-random-sampling-in-r-from-a-data-frame/#comments</comments> <pubDate>Fri, 20 May 2011 18:01:21 +0000</pubDate> <dc:creator>Ananda</dc:creator> <category><![CDATA[Geekiness]]></category> <category><![CDATA[Useless Knowledge]]></category> <category><![CDATA[code]]></category> <category><![CDATA[R]]></category> <category><![CDATA[R functions]]></category> <category><![CDATA[sampling]]></category> <category><![CDATA[statistics]]></category> <category><![CDATA[stratified sampling]]></category> <guid
isPermaLink="false">http://news.mrdwab.com/?p=1174</guid> <description><![CDATA[After a little bit more work, there&#8217;s a new stratified random sampling function, this one letting you sample from a data frame, returning all the variables for each of your samples as a nice data frame that you can continue working on as usual. Get the function at http://news.mrdwab.com/stratified. Usage notes in the head of [...]]]></description> <content:encoded><![CDATA[<p>After a little bit more work, there&#8217;s a new stratified random sampling function, this one letting you sample from a data frame, returning all the variables for each of your samples as a nice data frame that you can continue working on as usual.</p><p>Get the function at <a
href="http://news.mrdwab.com/stratified">http://news.mrdwab.com/stratified</a>. Usage notes in the head of the function.</p><p><span
id="more-1174"></span></p><p>Here&#8217;s the function:</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1174code3'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p11743"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
</pre></td><td
class="code" id="p1174code3"><pre class="rsplus" style="font-family:monospace;">stratified <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span>, id, group, size, seed<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;NULL&quot;</span>, ...<span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
  <span style="color: #228B22;">#  USE: * Specify your data frame, ID variable (as column number), and</span>
  <span style="color: #228B22;">#         grouping variable (as column number) as the first three arguments.</span>
  <span style="color: #228B22;">#       * Decide on your sample size. For a sample proportional to the</span>
  <span style="color: #228B22;">#         population, enter &quot;size&quot; as a decimal. For an equal number of</span>
  <span style="color: #228B22;">#         samples from each group, enter &quot;size&quot; as a whole number.</span>
  <span style="color: #228B22;">#       * Decide on if you want to use a seed or not. If not, leave blank</span>
  <span style="color: #228B22;">#         or type &quot;NULL&quot; (with quotes). </span>
  <span style="color: #228B22;">#</span>
  <span style="color: #228B22;">#  Example 1: To sample 10% of each group from a data frame named &quot;z&quot;, where</span>
  <span style="color: #228B22;">#             the ID variable is the first variable, the grouping variable</span>
  <span style="color: #228B22;">#             is the fourth variable, and the desired seed is &quot;1&quot;, use:</span>
  <span style="color: #228B22;"># </span>
  <span style="color: #228B22;">#                 &gt; stratified(z, 1, 4, .1, 1)</span>
  <span style="color: #228B22;">#</span>
  <span style="color: #228B22;">#  Example 2: To run the same sample as above but without a seed, use:</span>
  <span style="color: #228B22;"># </span>
  <span style="color: #228B22;">#                 &gt; stratified(z, 1, 4, .1)</span>
  <span style="color: #228B22;">#</span>
  <span style="color: #228B22;">#  Example 3: To sample 5 from each group from a data frame named &quot;z&quot;, where</span>
  <span style="color: #228B22;">#             the ID variable is the first variable, the grouping variable</span>
  <span style="color: #228B22;">#             is the third variable, and the desired seed is 2, use:</span>
  <span style="color: #228B22;">#</span>
  <span style="color: #228B22;">#                 &gt; stratified(z, 1, 3, 5, 2)</span>
  <span style="color: #228B22;">#</span>
  <span style="color: #228B22;">#  NOTE: Not tested on datasets with LOTS of groups or with HUGE</span>
  <span style="color: #228B22;">#        differences in group sizes. Probably INCREDIBLY inefficient.</span>
&nbsp;
  k <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">unstack</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">as.<span style="">vector</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#91;</span>id<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">as.<span style="">vector</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#91;</span>group<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  l <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span>
  results <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">vector</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;list&quot;</span>, l<span style="color: #080;">&#41;</span>
&nbsp;
  <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>seed <span style="color: #080;">==</span> <span style="color: #ff0000;">&quot;NULL&quot;</span> <span style="color: #080;">&amp;</span> size <span style="color: #080;">&lt;</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        n <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>N<span style="color: #080;">&#41;</span><span style="color: #080;">*</span>size<span style="color: #080;">&#41;</span>
        results<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, n, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>seed <span style="color: #080;">==</span> <span style="color: #ff0000;">&quot;NULL&quot;</span> <span style="color: #080;">&amp;</span> size <span style="color: #080;">&gt;=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        results<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, size, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>size <span style="color: #080;">&lt;</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span>seed<span style="color: #080;">&#41;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        n <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>N<span style="color: #080;">&#41;</span><span style="color: #080;">*</span>size<span style="color: #080;">&#41;</span>
        results<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, n, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>size <span style="color: #080;">&gt;=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span>seed<span style="color: #080;">&#41;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        results<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, size, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span>
  z <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">unlist</span><span style="color: #080;">&#40;</span>results<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>z<span style="color: #080;">&#41;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#91;</span>id<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>
  w <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">merge</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span>, z<span style="color: #080;">&#41;</span>
  w<span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span>w<span style="color: #080;">&#91;</span>group<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #080;">&#93;</span>
<span style="color: #080;">&#125;</span></pre></td></tr></table></div><p>And here are some examples of the function in action:</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1174code4'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p11744"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
</pre></td><td
class="code" id="p1174code4"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">source</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;http://news.mrdwab.com/stratified&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Make up some data</span>
<span style="color: #080;">&gt;</span> A <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">100</span>
<span style="color: #080;">&gt;</span> B <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;AA&quot;</span>, <span style="color: #ff0000;">&quot;BB&quot;</span>, <span style="color: #ff0000;">&quot;CC&quot;</span>, <span style="color: #ff0000;">&quot;DD&quot;</span>, <span style="color: #ff0000;">&quot;EE&quot;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">100</span>, <span style="color: #0000FF; font-weight: bold;">replace</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">C</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">D</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">abs</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span><span style="color: #080;">&#41;</span>, digits<span style="color: #080;">=</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> E <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;CA&quot;</span>, <span style="color: #ff0000;">&quot;NY&quot;</span>, <span style="color: #ff0000;">&quot;TX&quot;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">100</span>, <span style="color: #0000FF; font-weight: bold;">replace</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> dat <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>A, B, <span style="color: #0000FF; font-weight: bold;">C</span>, <span style="color: #0000FF; font-weight: bold;">D</span>, E<span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># view the first few rows</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">head</span><span style="color: #080;">&#40;</span>dat<span style="color: #080;">&#41;</span>
  A  B           <span style="color: #0000FF; font-weight: bold;">C</span>   <span style="color: #0000FF; font-weight: bold;">D</span>  E
<span style="color: #ff0000;">1</span> <span style="color: #ff0000;">1</span> CC <span style="color: #080;">-</span><span style="color: #ff0000;">0.07870439</span> <span style="color: #ff0000;">0.6</span> NY
<span style="color: #ff0000;">2</span> <span style="color: #ff0000;">2</span> CC <span style="color: #080;">-</span><span style="color: #ff0000;">0.65048634</span> <span style="color: #ff0000;">0.3</span> TX
<span style="color: #ff0000;">3</span> <span style="color: #ff0000;">3</span> EE  <span style="color: #ff0000;">1.02703616</span> <span style="color: #ff0000;">1.3</span> NY
<span style="color: #ff0000;">4</span> <span style="color: #ff0000;">4</span> BB <span style="color: #080;">-</span><span style="color: #ff0000;">1.08696775</span> <span style="color: #ff0000;">0.4</span> TX
<span style="color: #ff0000;">5</span> <span style="color: #ff0000;">5</span> CC  <span style="color: #ff0000;">0.56741795</span> <span style="color: #ff0000;">0.2</span> CA
<span style="color: #ff0000;">6</span> <span style="color: #ff0000;">6</span> AA <span style="color: #080;">-</span><span style="color: #ff0000;">0.46448941</span> <span style="color: #ff0000;">0.5</span> TX
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Sample 10% from each group from variable B, no seed</span>
<span style="color: #080;">&gt;</span> stratified<span style="color: #080;">&#40;</span>dat, <span style="color: #ff0000;">1</span>, <span style="color: #ff0000;">2</span>, .1<span style="color: #080;">&#41;</span>
    A  B           <span style="color: #0000FF; font-weight: bold;">C</span>   <span style="color: #0000FF; font-weight: bold;">D</span>  E
<span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">6</span> AA <span style="color: #080;">-</span><span style="color: #ff0000;">0.46448941</span> <span style="color: #ff0000;">0.5</span> TX
<span style="color: #ff0000;">7</span>  <span style="color: #ff0000;">71</span> AA  <span style="color: #ff0000;">1.98128479</span> <span style="color: #ff0000;">2.1</span> CA
<span style="color: #ff0000;">5</span>  <span style="color: #ff0000;">53</span> BB  <span style="color: #ff0000;">1.00539398</span> <span style="color: #ff0000;">0.7</span> NY
<span style="color: #ff0000;">10</span> <span style="color: #ff0000;">97</span> BB  <span style="color: #ff0000;">0.68252675</span> <span style="color: #ff0000;">1.9</span> NY
<span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">1</span> CC <span style="color: #080;">-</span><span style="color: #ff0000;">0.07870439</span> <span style="color: #ff0000;">0.6</span> NY
<span style="color: #ff0000;">4</span>  <span style="color: #ff0000;">42</span> CC <span style="color: #080;">-</span><span style="color: #ff0000;">2.00256854</span> <span style="color: #ff0000;">0.3</span> TX
<span style="color: #ff0000;">8</span>  <span style="color: #ff0000;">76</span> DD <span style="color: #080;">-</span><span style="color: #ff0000;">0.84151459</span> <span style="color: #ff0000;">0.2</span> NY
<span style="color: #ff0000;">9</span>  <span style="color: #ff0000;">95</span> DD <span style="color: #080;">-</span><span style="color: #ff0000;">0.47276142</span> <span style="color: #ff0000;">0.3</span> CA
<span style="color: #ff0000;">11</span> <span style="color: #ff0000;">99</span> DD  <span style="color: #ff0000;">1.05173419</span> <span style="color: #ff0000;">2.1</span> TX
<span style="color: #ff0000;">3</span>  <span style="color: #ff0000;">10</span> EE <span style="color: #080;">-</span><span style="color: #ff0000;">0.69079473</span> <span style="color: #ff0000;">1.1</span> TX
<span style="color: #ff0000;">6</span>  <span style="color: #ff0000;">57</span> EE <span style="color: #080;">-</span><span style="color: #ff0000;">0.38210921</span> <span style="color: #ff0000;">1.5</span> CA
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Sample 10% from each group from variable E, seed of 1</span>
<span style="color: #080;">&gt;</span> stratified<span style="color: #080;">&#40;</span>dat, <span style="color: #ff0000;">1</span>, <span style="color: #ff0000;">5</span>, .1, <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>
    A  B          <span style="color: #0000FF; font-weight: bold;">C</span>   <span style="color: #0000FF; font-weight: bold;">D</span>  E
<span style="color: #ff0000;">4</span>  <span style="color: #ff0000;">33</span> AA  <span style="color: #ff0000;">1.6105099</span> <span style="color: #ff0000;">0.5</span> CA
<span style="color: #ff0000;">7</span>  <span style="color: #ff0000;">48</span> AA  <span style="color: #ff0000;">0.3128274</span> <span style="color: #ff0000;">0.6</span> CA
<span style="color: #ff0000;">9</span>  <span style="color: #ff0000;">62</span> DD  <span style="color: #ff0000;">0.4673061</span> <span style="color: #ff0000;">0.0</span> CA
<span style="color: #ff0000;">10</span> <span style="color: #ff0000;">86</span> EE  <span style="color: #ff0000;">0.4047880</span> <span style="color: #ff0000;">1.6</span> CA
<span style="color: #ff0000;">3</span>  <span style="color: #ff0000;">28</span> AA <span style="color: #080;">-</span><span style="color: #ff0000;">1.6815553</span> <span style="color: #ff0000;">0.3</span> NY
<span style="color: #ff0000;">5</span>  <span style="color: #ff0000;">36</span> AA  <span style="color: #ff0000;">0.3307508</span> <span style="color: #ff0000;">0.3</span> NY
<span style="color: #ff0000;">8</span>  <span style="color: #ff0000;">53</span> BB  <span style="color: #ff0000;">1.0053940</span> <span style="color: #ff0000;">0.7</span> NY
<span style="color: #ff0000;">1</span>  <span style="color: #ff0000;">21</span> DD  <span style="color: #ff0000;">0.5229282</span> <span style="color: #ff0000;">1.2</span> TX
<span style="color: #ff0000;">2</span>  <span style="color: #ff0000;">27</span> BB  <span style="color: #ff0000;">0.8678977</span> <span style="color: #ff0000;">0.7</span> TX
<span style="color: #ff0000;">6</span>  <span style="color: #ff0000;">44</span> DD <span style="color: #080;">-</span><span style="color: #ff0000;">0.5790353</span> <span style="color: #ff0000;">0.9</span> TX
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># You can also be verbose if it helps you remember what you're doing</span>
<span style="color: #080;">&gt;</span> stratified<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">=</span>dat, id<span style="color: #080;">=</span><span style="color: #ff0000;">1</span>, group<span style="color: #080;">=</span><span style="color: #ff0000;">5</span>, size<span style="color: #080;">=</span>.1, seed<span style="color: #080;">=</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>
    A  B          <span style="color: #0000FF; font-weight: bold;">C</span>   <span style="color: #0000FF; font-weight: bold;">D</span>  E
<span style="color: #ff0000;">4</span>  <span style="color: #ff0000;">33</span> AA  <span style="color: #ff0000;">1.6105099</span> <span style="color: #ff0000;">0.5</span> CA
<span style="color: #ff0000;">7</span>  <span style="color: #ff0000;">48</span> AA  <span style="color: #ff0000;">0.3128274</span> <span style="color: #ff0000;">0.6</span> CA
<span style="color: #ff0000;">9</span>  <span style="color: #ff0000;">62</span> DD  <span style="color: #ff0000;">0.4673061</span> <span style="color: #ff0000;">0.0</span> CA
<span style="color: #ff0000;">10</span> <span style="color: #ff0000;">86</span> EE  <span style="color: #ff0000;">0.4047880</span> <span style="color: #ff0000;">1.6</span> CA
<span style="color: #ff0000;">3</span>  <span style="color: #ff0000;">28</span> AA <span style="color: #080;">-</span><span style="color: #ff0000;">1.6815553</span> <span style="color: #ff0000;">0.3</span> NY
<span style="color: #ff0000;">5</span>  <span style="color: #ff0000;">36</span> AA  <span style="color: #ff0000;">0.3307508</span> <span style="color: #ff0000;">0.3</span> NY
<span style="color: #ff0000;">8</span>  <span style="color: #ff0000;">53</span> BB  <span style="color: #ff0000;">1.0053940</span> <span style="color: #ff0000;">0.7</span> NY
<span style="color: #ff0000;">1</span>  <span style="color: #ff0000;">21</span> DD  <span style="color: #ff0000;">0.5229282</span> <span style="color: #ff0000;">1.2</span> TX
<span style="color: #ff0000;">2</span>  <span style="color: #ff0000;">27</span> BB  <span style="color: #ff0000;">0.8678977</span> <span style="color: #ff0000;">0.7</span> TX
<span style="color: #ff0000;">6</span>  <span style="color: #ff0000;">44</span> DD <span style="color: #080;">-</span><span style="color: #ff0000;">0.5790353</span> <span style="color: #ff0000;">0.9</span> TX</pre></td></tr></table></div> ]]></content:encoded> <wfw:commentRss>http://news.mrdwab.com/2011/05/20/stratified-random-sampling-in-r-from-a-data-frame/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>Stratified Random Sampling in R&#8211;A Function in Progress</title><link>http://news.mrdwab.com/2011/05/15/stratified-random-sampling-in-r-beta/</link> <comments>http://news.mrdwab.com/2011/05/15/stratified-random-sampling-in-r-beta/#comments</comments> <pubDate>Sun, 15 May 2011 10:16:02 +0000</pubDate> <dc:creator>Ananda</dc:creator> <category><![CDATA[(all categories)]]></category> <category><![CDATA[Geekiness]]></category> <category><![CDATA[Useless Knowledge]]></category> <category><![CDATA[code]]></category> <category><![CDATA[experiments]]></category> <category><![CDATA[R]]></category> <category><![CDATA[R functions]]></category> <category><![CDATA[sampling]]></category> <category><![CDATA[statistics]]></category> <category><![CDATA[tapply()]]></category> <guid
isPermaLink="false">http://news.mrdwab.com/?p=1141</guid> <description><![CDATA[IMPORTANT: This is here mostly to remind me of how I solved my problem. You should read Stratified random sampling in R from a data frame if you really want to use this function. I know that sampling is quite complex, and I will admit that I know very little about its complexities. Fortunately, software [...]]]></description> <content:encoded><![CDATA[<blockquote><p><strong>IMPORTANT</strong>: This is here mostly to remind me of how I solved my problem. You should read <a
href="http://news.mrdwab.com/2011/05/20/stratified-random-sampling-in-r-from-a-data-frame/" title="Stratified random sampling in R from a data frame">Stratified random sampling in R from a data frame</a> if you really want to use this function.</p></blockquote><p>I know that sampling is quite complex, and I will admit that I know very little about its complexities. Fortunately, software like <a
href="http://www.r-project.org">R</a> lets you draw <a
href="http://news.mrdwab.com/2009/11/29/simple-sampling-with-r/" title="Simple sampling with R">simple random samples</a> pretty easily, either <a
href="http://news.mrdwab.com/2009/11/30/sampling-with-replacement-in-r/" title="Sampling with replacement in R">either with</a> or without replacement. Unfortunately, I could not find any feature to allow me to do simple stratified random sampling, at least not with the features I was looking for. Fortunately again, with a little bit of experimenting, it can be pretty easy to learn how to write functions in R when a direct solution does not present itself.</p><p>This post shares my initial &#8220;work-in-progress&#8221; on writing an R function for stratified sampling.</p><p><span
id="more-1141"></span></p><h2>The problem&#8230;</h2><p>Here&#8217;s the minimum that I was hoping for:</p><ul><li>I wanted to be able to draw both a proportional sample (which is more common, for it allows you to make generalizations about the population as a whole) as well as a fixed-size sample (which less common, but it is useful for making comparisons across groups).</li><li>I often use a seed when sampling, so I wanted that to be a part of the function.</li><li>I wanted the output to be the same as if I were to sample from each group individually.<li>I was hoping that my output could be stored as a new object that I could then reuse (either a list or a data frame, preferably the latter).</li></ul><p>My initial searches directed me to <a
href="http://yihui.name/r/stat/sampling_survey/stratified/index.htm" target="_blank">Yihui Xie&#8217;s page on stratified sampling using tapply()</a>. However, this option did not satisfy my needs. As far as I could figure, it only allowed me to take a fixed sample size. Also, I wasn&#8217;t totally satisfied with the output.</p><p>Consider the following. In Yihui Xie&#8217;s example, there is a difference between the results one would get if they sampled from each group separately, but using the same seed.</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1141code10'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p114110"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
</pre></td><td
class="code" id="p1141code10"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> dat <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>x <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">15</span>, stratum <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">gl</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">3</span>, <span style="color: #ff0000;">5</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> dat
    x stratum
<span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">1</span>       <span style="color: #ff0000;">1</span>
<span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">2</span>       <span style="color: #ff0000;">1</span>
<span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">3</span>       <span style="color: #ff0000;">1</span>
<span style="color: #ff0000;">4</span>   <span style="color: #ff0000;">4</span>       <span style="color: #ff0000;">1</span>
<span style="color: #ff0000;">5</span>   <span style="color: #ff0000;">5</span>       <span style="color: #ff0000;">1</span>
<span style="color: #ff0000;">6</span>   <span style="color: #ff0000;">6</span>       <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">7</span>   <span style="color: #ff0000;">7</span>       <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">8</span>   <span style="color: #ff0000;">8</span>       <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">9</span>   <span style="color: #ff0000;">9</span>       <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">10</span> <span style="color: #ff0000;">10</span>       <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">11</span> <span style="color: #ff0000;">11</span>       <span style="color: #ff0000;">3</span>
<span style="color: #ff0000;">12</span> <span style="color: #ff0000;">12</span>       <span style="color: #ff0000;">3</span>
<span style="color: #ff0000;">13</span> <span style="color: #ff0000;">13</span>       <span style="color: #ff0000;">3</span>
<span style="color: #ff0000;">14</span> <span style="color: #ff0000;">14</span>       <span style="color: #ff0000;">3</span>
<span style="color: #ff0000;">15</span> <span style="color: #ff0000;">15</span>       <span style="color: #ff0000;">3</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span> <span style="color: #0000FF; font-weight: bold;">tapply</span><span style="color: #080;">&#40;</span>dat$x, dat$stratum, <span style="color: #0000FF; font-weight: bold;">sample</span>, size <span style="color: #080;">=</span> <span style="color: #ff0000;">3</span><span style="color: #080;">&#41;</span>
$`<span style="color: #ff0000;">1</span>`
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">5</span> <span style="color: #ff0000;">4</span>
&nbsp;
$`<span style="color: #ff0000;">2</span>`
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">10</span>  <span style="color: #ff0000;">6</span>  <span style="color: #ff0000;">8</span>
&nbsp;
$`<span style="color: #ff0000;">3</span>`
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">15</span> <span style="color: #ff0000;">13</span> <span style="color: #ff0000;">12</span>
&nbsp;
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Compare with what we get when we sample individually:</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">5</span>, <span style="color: #ff0000;">3</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">5</span> <span style="color: #ff0000;">4</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">6</span><span style="color: #080;">:</span><span style="color: #ff0000;">10</span>, <span style="color: #ff0000;">3</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>  <span style="color: #ff0000;">7</span> <span style="color: #ff0000;">10</span>  <span style="color: #ff0000;">9</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">11</span><span style="color: #080;">:</span><span style="color: #ff0000;">15</span>, <span style="color: #ff0000;">3</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">12</span> <span style="color: #ff0000;">15</span> <span style="color: #ff0000;">14</span></pre></td></tr></table></div><p>I&#8217;m sure there&#8217;s some sampling theory that explains this, or at least something about how R treats its data, but at the moment, that&#8217;s beyond my humble level of expertise.</p><h2>Stratified sampling, Mr. DWAB style&#8230;</h2><p>The solution I arrived at is to use &#8220;unstack()&#8221; and a few conditional loops to take the samples.</p><p>And, without more rambling, here&#8217;s what I came up with.</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1141code11'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p114111"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
</pre></td><td
class="code" id="p1141code11"><pre class="rsplus" style="font-family:monospace;">stratified <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span>, size, seed, dframe<span style="color: #080;">=</span>FALSE, ...<span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
         <span style="color: #228B22;"># USE: Start with a data frame with your cases in one column and your</span>
         <span style="color: #228B22;"># groups in another column. Decide on if you want to use a seed or not. </span>
         <span style="color: #228B22;"># If not, seed should be &quot;NO&quot; (with quotes). Decide on if you want your</span>
         <span style="color: #228B22;"># output as a data frame or not; by default, dframe is set to &quot;FALSE&quot;.</span>
         <span style="color: #228B22;"># To take a sample proportional to the population size in each group,</span>
         <span style="color: #228B22;"># enter &quot;size&quot; as a decimal. Otherwise, enter size as a whole number.</span>
         <span style="color: #228B22;">#</span>
         <span style="color: #228B22;"># Example 1a: To sample 10% of each group from a data frame named &quot;z&quot;</span>
         <span style="color: #228B22;"># and using a seed of &quot;1&quot;, use: &gt; stratified(z, .1, 1)</span>
         <span style="color: #228B22;"># Example 1b: To run the same sample as above but display the result as</span>
         <span style="color: #228B22;"># a data frame, use: &gt; stratified(z, .1, 1, T)</span>
         <span style="color: #228B22;">#</span>
         <span style="color: #228B22;"># Example 2: To sample 10% of each group from a data frame named &quot;z&quot;</span>
         <span style="color: #228B22;"># and using no seed, use: &gt; stratified(z, .1, &quot;NO&quot;)</span>
         <span style="color: #228B22;">#</span>
         <span style="color: #228B22;"># Example 3: To sample 5 from each group from a data frame named &quot;z&quot;</span>
         <span style="color: #228B22;"># and using a seed of 30, use: &gt; stratified(z, 5, 30)</span>
         <span style="color: #228B22;">#</span>
         <span style="color: #228B22;"># NOTE: Not recommended for datasets with LOTS of groups or with HUGE</span>
         <span style="color: #228B22;"># differences in group sizes.</span>
  k <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">unstack</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#41;</span>
  <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>dframe <span style="color: #080;">==</span> FALSE<span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
    <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>seed <span style="color: #080;">==</span> <span style="color: #ff0000;">&quot;NO&quot;</span> <span style="color: #080;">&amp;</span> size <span style="color: #080;">&lt;</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        n <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>N<span style="color: #080;">&#41;</span><span style="color: #080;">*</span>size<span style="color: #080;">&#41;</span>
        pre <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">structure</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Population Size&quot;</span> <span style="color: #080;">=</span>
                             <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Sample Size&quot;</span> <span style="color: #080;">=</span> n, Seed <span style="color: #080;">=</span> seed,
                             Sample <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, n, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">class</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;power.htest&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>pre<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>seed <span style="color: #080;">==</span> <span style="color: #ff0000;">&quot;NO&quot;</span> <span style="color: #080;">&amp;</span> size <span style="color: #080;">&gt;=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        pre <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">structure</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Population Size&quot;</span> <span style="color: #080;">=</span>
                             <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Sample Size&quot;</span> <span style="color: #080;">=</span> size, Seed <span style="color: #080;">=</span> seed,
                             Sample <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, size, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
                             <span style="color: #0000FF; font-weight: bold;">class</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;power.htest&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>pre<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>size <span style="color: #080;">&lt;</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span>seed<span style="color: #080;">&#41;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        n <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>N<span style="color: #080;">&#41;</span><span style="color: #080;">*</span>size<span style="color: #080;">&#41;</span>
        pre <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">structure</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Population Size&quot;</span> <span style="color: #080;">=</span>
                             <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Sample Size&quot;</span> <span style="color: #080;">=</span> n, Seed <span style="color: #080;">=</span> seed,
                             Sample <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, n, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">class</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;power.htest&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>pre<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>size <span style="color: #080;">&gt;=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span>seed<span style="color: #080;">&#41;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        pre <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">structure</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Population Size&quot;</span> <span style="color: #080;">=</span>
                             <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;Sample Size&quot;</span> <span style="color: #080;">=</span> size, Seed <span style="color: #080;">=</span> seed,
                             Sample <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, size, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
                             <span style="color: #0000FF; font-weight: bold;">class</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;power.htest&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>pre<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span>
  <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #080;">&#123;</span>
    <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>seed <span style="color: #080;">==</span> <span style="color: #ff0000;">&quot;NO&quot;</span> <span style="color: #080;">&amp;</span> size <span style="color: #080;">&lt;</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        n <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>N<span style="color: #080;">&#41;</span><span style="color: #080;">*</span>size<span style="color: #080;">&#41;</span>
        res <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, n, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span>, <span style="color: #ff0000;">&quot;Samples&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>seed <span style="color: #080;">==</span> <span style="color: #ff0000;">&quot;NO&quot;</span> <span style="color: #080;">&amp;</span> size <span style="color: #080;">&gt;=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        res <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, size, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span>, <span style="color: #ff0000;">&quot;Samples&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>size <span style="color: #080;">&lt;</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span>seed<span style="color: #080;">&#41;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        n <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>N<span style="color: #080;">&#41;</span><span style="color: #080;">*</span>size<span style="color: #080;">&#41;</span>
        res <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, n, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span>, <span style="color: #ff0000;">&quot;Samples&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span>size <span style="color: #080;">&gt;=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
      <span style="color: #0000FF; font-weight: bold;">for</span> <span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
        <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span>seed<span style="color: #080;">&#41;</span>
        N <span style="color: #080;">=</span> k<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
        res <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>k<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>N, size, ...<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Group&quot;</span>, <span style="color: #ff0000;">&quot;Samples&quot;</span><span style="color: #080;">&#41;</span>
        <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>res<span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#125;</span>
    <span style="color: #080;">&#125;</span>
  <span style="color: #080;">&#125;</span>
<span style="color: #080;">&#125;</span></pre></td></tr></table></div><p>You can load the function by typing:</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1141code12'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p114112"><td
class="line_numbers"><pre>1
</pre></td><td
class="code" id="p1141code12"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">source</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;http://news.mrdwab.com/stratified-beta&quot;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div><h2>And now, to test it&#8230;</h2><p>Let&#8217;s generate some dummy data and see what we can come up with. The function takes the following arguments (in the following order):</p><ul><li><code>df</code>: The source data frame, with the first column being the IDs and the second column being the groups.</li><li><code>size</code>: The sample size you want, either as a percentage (for proportional sampling&#8211;expressed as a decimal) or as a whole number.</li><li><code>seed</code>: The seed you want to use. If you don&#8217;t want to use a seed, enter &#8220;NO&#8221;.</li><li><code>dframe</code>: What format you want the output in, either a list or a data frame. Defaults to a list (<code>dframe=FALSE</code>), which is better at the moment since the data frame option is not working the way I expect it to yet.</li></ul><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1141code13'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p114113"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
</pre></td><td
class="code" id="p1141code13"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Generate some data</span>
<span style="color: #080;">&gt;</span> a <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">100</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">123</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> b <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;a&quot;</span>, <span style="color: #ff0000;">&quot;b&quot;</span>, <span style="color: #ff0000;">&quot;c&quot;</span>, <span style="color: #ff0000;">&quot;d&quot;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">100</span>, <span style="color: #0000FF; font-weight: bold;">replace</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> z <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>a, b<span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Check how big each group is</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">table</span><span style="color: #080;">&#40;</span>z$b<span style="color: #080;">&#41;</span>
&nbsp;
 a  b  <span style="color: #0000FF; font-weight: bold;">c</span>  d
<span style="color: #ff0000;">26</span> <span style="color: #ff0000;">27</span> <span style="color: #ff0000;">20</span> <span style="color: #ff0000;">27</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Make sure the function is loaded before you continue!</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># source(&quot;http://news.mrdwab.com/stratified-beta&quot;)</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Take a 15% sample and use a seed of 1</span>
<span style="color: #080;">&gt;</span> stratified<span style="color: #080;">&#40;</span>z, .15, <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> a
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">26</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">4</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">38</span>, <span style="color: #ff0000;">45</span>, <span style="color: #ff0000;">54</span>, <span style="color: #ff0000;">81</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> b
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">27</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">4</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">39</span>, <span style="color: #ff0000;">43</span>, <span style="color: #ff0000;">60</span>, <span style="color: #ff0000;">79</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span>
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">20</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">3</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">23</span>, <span style="color: #ff0000;">26</span>, <span style="color: #ff0000;">33</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> d
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">27</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">4</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">21</span>, <span style="color: #ff0000;">31</span>, <span style="color: #ff0000;">53</span>, <span style="color: #ff0000;">71</span>
&nbsp;
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Take a sample of 5 from each group and use a seed of 1</span>
<span style="color: #080;">&gt;</span> stratified<span style="color: #080;">&#40;</span>z, <span style="color: #ff0000;">5</span>, <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> a
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">26</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">5</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">38</span>, <span style="color: #ff0000;">45</span>, <span style="color: #ff0000;">54</span>, <span style="color: #ff0000;">81</span>, <span style="color: #ff0000;">30</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> b
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">27</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">5</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">39</span>, <span style="color: #ff0000;">43</span>, <span style="color: #ff0000;">60</span>, <span style="color: #ff0000;">79</span>, <span style="color: #ff0000;">19</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span>
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">20</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">5</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">23</span>, <span style="color: #ff0000;">26</span>, <span style="color: #ff0000;">33</span>, <span style="color: #ff0000;">78</span>, <span style="color: #ff0000;">14</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> d
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">27</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">5</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">21</span>, <span style="color: #ff0000;">31</span>, <span style="color: #ff0000;">53</span>, <span style="color: #ff0000;">71</span>, <span style="color: #ff0000;">11</span>
&nbsp;
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Take a sample of 15 from each group, with replacement, and a seed of 1</span>
<span style="color: #080;">&gt;</span> stratified<span style="color: #080;">&#40;</span>z, <span style="color: #ff0000;">15</span>, <span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">replace</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> a
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">26</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">15</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">38</span>, <span style="color: #ff0000;">45</span>, <span style="color: #ff0000;">56</span>, <span style="color: #ff0000;">91</span>, <span style="color: #ff0000;">35</span>, <span style="color: #ff0000;">91</span>, <span style="color: #ff0000;">96</span>, <span style="color: #ff0000;">74</span>, <span style="color: #ff0000;">62</span>, <span style="color: #ff0000;">15</span>, <span style="color: #ff0000;">35</span>, <span style="color: #ff0000;">30</span>, <span style="color: #ff0000;">74</span>, <span style="color: #ff0000;">45</span>, <span style="color: #ff0000;">81</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> b
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">27</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">15</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">39</span>, <span style="color: #ff0000;">44</span>, <span style="color: #ff0000;">63</span>, <span style="color: #ff0000;">93</span>, <span style="color: #ff0000;">29</span>, <span style="color: #ff0000;">93</span>, <span style="color: #ff0000;">95</span>, <span style="color: #ff0000;">66</span>, <span style="color: #ff0000;">64</span>, <span style="color: #ff0000;">3</span>, <span style="color: #ff0000;">29</span>, <span style="color: #ff0000;">19</span>, <span style="color: #ff0000;">70</span>, <span style="color: #ff0000;">44</span>, <span style="color: #ff0000;">77</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span>
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">20</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">15</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">23</span>, <span style="color: #ff0000;">26</span>, <span style="color: #ff0000;">55</span>, <span style="color: #ff0000;">94</span>, <span style="color: #ff0000;">22</span>, <span style="color: #ff0000;">92</span>, <span style="color: #ff0000;">94</span>, <span style="color: #ff0000;">72</span>, <span style="color: #ff0000;">61</span>, <span style="color: #ff0000;">9</span>, <span style="color: #ff0000;">22</span>, <span style="color: #ff0000;">14</span>, <span style="color: #ff0000;">72</span>, <span style="color: #ff0000;">26</span>, <span style="color: #ff0000;">78</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
          Group <span style="color: #080;">=</span> d
Population Size <span style="color: #080;">=</span> <span style="color: #ff0000;">27</span>
    Sample Size <span style="color: #080;">=</span> <span style="color: #ff0000;">15</span>
           Seed <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>
         Sample <span style="color: #080;">=</span> <span style="color: #ff0000;">21</span>, <span style="color: #ff0000;">32</span>, <span style="color: #ff0000;">58</span>, <span style="color: #ff0000;">88</span>, <span style="color: #ff0000;">16</span>, <span style="color: #ff0000;">88</span>, <span style="color: #ff0000;">89</span>, <span style="color: #ff0000;">65</span>, <span style="color: #ff0000;">59</span>, <span style="color: #ff0000;">4</span>, <span style="color: #ff0000;">16</span>, <span style="color: #ff0000;">11</span>, <span style="color: #ff0000;">67</span>, <span style="color: #ff0000;">32</span>, <span style="color: #ff0000;">69</span>
&nbsp;
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Take a sample of 10% from each group, using a seed of 1,</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># and display the output as a data frame</span>
<span style="color: #080;">&gt;</span> stratified<span style="color: #080;">&#40;</span>z, .1, <span style="color: #ff0000;">1</span>, dframe<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
  Group Samples
<span style="color: #ff0000;">1</span>     a      <span style="color: #ff0000;">38</span>
<span style="color: #ff0000;">2</span>     a      <span style="color: #ff0000;">45</span>
<span style="color: #ff0000;">3</span>     a      <span style="color: #ff0000;">54</span>
  Group Samples
<span style="color: #ff0000;">1</span>     b      <span style="color: #ff0000;">39</span>
<span style="color: #ff0000;">2</span>     b      <span style="color: #ff0000;">43</span>
<span style="color: #ff0000;">3</span>     b      <span style="color: #ff0000;">60</span>
  Group Samples
<span style="color: #ff0000;">1</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">23</span>
<span style="color: #ff0000;">2</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">26</span>
  Group Samples
<span style="color: #ff0000;">1</span>     d      <span style="color: #ff0000;">21</span>
<span style="color: #ff0000;">2</span>     d      <span style="color: #ff0000;">31</span>
<span style="color: #ff0000;">3</span>     d      <span style="color: #ff0000;">53</span></pre></td></tr></table></div><h2>Replicating the results from tapply()</h2><p>I mentioned earlier that the results are different from what you would get if you were to use the <code>tapply()</code> function. However, it is easy to get the same results using this <code>stratified</code> function&#8211;simply move your &#8220;<code>seed</code>&#8221; outside of the function (enter seed as <code>"NO"</code> [with quotes] and instead, use <code>set.seed()</code> as you normally would).</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1141code14'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p114114"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
</pre></td><td
class="code" id="p1141code14"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> <span style="color: #228B22;"># See what tapply() gives us</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span> <span style="color: #0000FF; font-weight: bold;">tapply</span><span style="color: #080;">&#40;</span>z$a, z$b, <span style="color: #0000FF; font-weight: bold;">sample</span>, size <span style="color: #080;">=</span> <span style="color: #ff0000;">4</span><span style="color: #080;">&#41;</span>
$a
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">38</span> <span style="color: #ff0000;">45</span> <span style="color: #ff0000;">54</span> <span style="color: #ff0000;">81</span>
&nbsp;
$b
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">29</span> <span style="color: #ff0000;">86</span> <span style="color: #ff0000;">95</span> <span style="color: #ff0000;">63</span>
&nbsp;
$c
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">61</span>  <span style="color: #ff0000;">9</span> <span style="color: #ff0000;">14</span> <span style="color: #ff0000;">92</span>
&nbsp;
$d
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #ff0000;">67</span> <span style="color: #ff0000;">31</span> <span style="color: #ff0000;">68</span> <span style="color: #ff0000;">34</span>
&nbsp;
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># The normal usage for the stratified function</span>
<span style="color: #080;">&gt;</span> stratified<span style="color: #080;">&#40;</span>z, <span style="color: #ff0000;">4</span>, <span style="color: #ff0000;">1</span>, dframe <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
  Group Samples
<span style="color: #ff0000;">1</span>     a      <span style="color: #ff0000;">38</span>
<span style="color: #ff0000;">2</span>     a      <span style="color: #ff0000;">45</span>
<span style="color: #ff0000;">3</span>     a      <span style="color: #ff0000;">54</span>
<span style="color: #ff0000;">4</span>     a      <span style="color: #ff0000;">81</span>
  Group Samples
<span style="color: #ff0000;">1</span>     b      <span style="color: #ff0000;">39</span>
<span style="color: #ff0000;">2</span>     b      <span style="color: #ff0000;">43</span>
<span style="color: #ff0000;">3</span>     b      <span style="color: #ff0000;">60</span>
<span style="color: #ff0000;">4</span>     b      <span style="color: #ff0000;">79</span>
  Group Samples
<span style="color: #ff0000;">1</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">23</span>
<span style="color: #ff0000;">2</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">26</span>
<span style="color: #ff0000;">3</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">33</span>
<span style="color: #ff0000;">4</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">78</span>
  Group Samples
<span style="color: #ff0000;">1</span>     d      <span style="color: #ff0000;">21</span>
<span style="color: #ff0000;">2</span>     d      <span style="color: #ff0000;">31</span>
<span style="color: #ff0000;">3</span>     d      <span style="color: #ff0000;">53</span>
<span style="color: #ff0000;">4</span>     d      <span style="color: #ff0000;">71</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Getting the same results as tapply()</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Set the seed before using the function,</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># and set the seed for the function as &quot;NO&quot;</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span> stratified<span style="color: #080;">&#40;</span>z, <span style="color: #ff0000;">4</span>, <span style="color: #ff0000;">&quot;NO&quot;</span>, dframe <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
  Group Samples
<span style="color: #ff0000;">1</span>     a      <span style="color: #ff0000;">38</span>
<span style="color: #ff0000;">2</span>     a      <span style="color: #ff0000;">45</span>
<span style="color: #ff0000;">3</span>     a      <span style="color: #ff0000;">54</span>
<span style="color: #ff0000;">4</span>     a      <span style="color: #ff0000;">81</span>
  Group Samples
<span style="color: #ff0000;">1</span>     b      <span style="color: #ff0000;">29</span>
<span style="color: #ff0000;">2</span>     b      <span style="color: #ff0000;">86</span>
<span style="color: #ff0000;">3</span>     b      <span style="color: #ff0000;">95</span>
<span style="color: #ff0000;">4</span>     b      <span style="color: #ff0000;">63</span>
  Group Samples
<span style="color: #ff0000;">1</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">61</span>
<span style="color: #ff0000;">2</span>     <span style="color: #0000FF; font-weight: bold;">c</span>       <span style="color: #ff0000;">9</span>
<span style="color: #ff0000;">3</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">14</span>
<span style="color: #ff0000;">4</span>     <span style="color: #0000FF; font-weight: bold;">c</span>      <span style="color: #ff0000;">92</span>
  Group Samples
<span style="color: #ff0000;">1</span>     d      <span style="color: #ff0000;">67</span>
<span style="color: #ff0000;">2</span>     d      <span style="color: #ff0000;">31</span>
<span style="color: #ff0000;">3</span>     d      <span style="color: #ff0000;">68</span>
<span style="color: #ff0000;">4</span>     d      <span style="color: #ff0000;">34</span></pre></td></tr></table></div><h2>The unfortunate&#8230;</h2><p>There are some advantages to each of the output formats. I&#8217;ve set up the list to be quite verbose, which is useful with the proportionate sampling since it shows us how many samples have been taken from each group. The data frame output format, on the other hand, is quite compact.</p><p>What I still need to figure out, though, is why R won&#8217;t store my output. I suspect that it has something to do with how my loops are set up. I assume that somewhere, I need to add something like an rbind command.</p><p>When the time is right, I will be sure to post what I&#8217;ve found.</p> ]]></content:encoded> <wfw:commentRss>http://news.mrdwab.com/2011/05/15/stratified-random-sampling-in-r-beta/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Reshaping data in R revisited</title><link>http://news.mrdwab.com/2011/04/18/reshaping-data-in-r-revisited/</link> <comments>http://news.mrdwab.com/2011/04/18/reshaping-data-in-r-revisited/#comments</comments> <pubDate>Mon, 18 Apr 2011 04:15:17 +0000</pubDate> <dc:creator>Ananda</dc:creator> <category><![CDATA[Geekiness]]></category> <category><![CDATA[Useless Knowledge]]></category> <category><![CDATA[code]]></category> <category><![CDATA[data manipulation]]></category> <category><![CDATA[R]]></category> <category><![CDATA[reshape]]></category> <category><![CDATA[Stata]]></category> <guid
isPermaLink="false">http://news.mrdwab.com/?p=1116</guid> <description><![CDATA[A year ago, I wrote a post about reshaping data from a wide format to a long format. I thought that considering how much time had passed, it would be good to revisit R&#8217;s in-built reshape functions. For these examples, I&#8217;ve copied the Stata examples from the UCLA Academic Technology Services&#8217;s &#8220;Reshape data wide to [...]]]></description> <content:encoded><![CDATA[<p>A year ago, I wrote a post about reshaping data from a wide format to a long format. I thought that considering how much time had passed, it would be good to revisit R&#8217;s in-built reshape functions.</p><p>For these examples, I&#8217;ve copied the Stata examples from the UCLA Academic Technology Services&#8217;s <a
href="http://www.ats.ucla.edu/stat/stata/modules/reshapel.htm">&#8220;Reshape data wide to long&#8221;</a> page. Since the data is provided in Stata dta files, you need to first load the &#8220;foreign&#8221; package to be able to read the data in R.</p><p><span
id="more-1116"></span></p><p>This first example is very basic. There are four variables, the first one being the unique id, and the remaining three being the measures over three years.</p><p>The basic reshape command in R needs you to specify the data that is being reshaped, the ultimate &#8220;direction&#8221; (wide or long), and which variables are the ones to be reshaped. The default character R expects for &#8220;sep&#8221; is a period&#8211;in other words, it expects that your variables are named in the form of &#8220;faminc.96&#8243; and so on. However, your variables may be named in other ways, for example &#8220;faminc-96&#8243;, &#8220;faminc_96&#8243;, or (as in this example) &#8220;faminc96&#8243;. If your variable naming pattern is anything other than what R expects as its default, you also need to specify the separating character. In the case of this dataset, there is no separating character, so you simply use <code>sep=""</code>.</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1116code20'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p111620"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td
class="code" id="p1116code20"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span>foreign<span style="color: #080;">&#41;</span> <span style="color: #228B22;"># Lets us use Stata files directly</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Example 1: Very basic reshape</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Use &quot;read.dta&quot; instead of &quot;read.csv&quot; or &quot;read.table&quot;</span>
<span style="color: #080;">&gt;</span> faminc <span style="color: #080;">=</span> read.<span style="">dta</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;http://www.ats.ucla.edu/stat/stata/modules/faminc.dta&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> faminc
  famid faminc96 faminc97 faminc98
<span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">3</span>    <span style="color: #ff0000;">75000</span>    <span style="color: #ff0000;">76000</span>    <span style="color: #ff0000;">77000</span>
<span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">1</span>    <span style="color: #ff0000;">40000</span>    <span style="color: #ff0000;">40500</span>    <span style="color: #ff0000;">41000</span>
<span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">2</span>    <span style="color: #ff0000;">45000</span>    <span style="color: #ff0000;">45400</span>    <span style="color: #ff0000;">45800</span>
<span style="color: #080;">&gt;</span> l.<span style="">faminc</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">reshape</span><span style="color: #080;">&#40;</span>faminc, direction<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;long&quot;</span>, varying<span style="color: #080;">=</span><span style="color: #ff0000;">2</span><span style="color: #080;">:</span><span style="color: #ff0000;">4</span>, sep<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;&quot;</span>, idvar<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;famid&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> l.<span style="">faminc</span><span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span>l.<span style="">faminc</span>$famid<span style="color: #080;">&#41;</span>,<span style="color: #080;">&#93;</span>
     famid <span style="color: #0000FF; font-weight: bold;">time</span> faminc
<span style="color: #ff0000;">1.96</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">96</span>  <span style="color: #ff0000;">40000</span>
<span style="color: #ff0000;">1.97</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">97</span>  <span style="color: #ff0000;">40500</span>
<span style="color: #ff0000;">1.98</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">98</span>  <span style="color: #ff0000;">41000</span>
<span style="color: #ff0000;">2.96</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">96</span>  <span style="color: #ff0000;">45000</span>
<span style="color: #ff0000;">2.97</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">97</span>  <span style="color: #ff0000;">45400</span>
<span style="color: #ff0000;">2.98</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">98</span>  <span style="color: #ff0000;">45800</span>
<span style="color: #ff0000;">3.96</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">96</span>  <span style="color: #ff0000;">75000</span>
<span style="color: #ff0000;">3.97</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">97</span>  <span style="color: #ff0000;">76000</span>
<span style="color: #ff0000;">3.98</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">98</span>  <span style="color: #ff0000;">77000</span></pre></td></tr></table></div><p>In the second example at UCLA ATS, the unique identifier is the combination of the first two variables. Since R assumes that whatever you have not specified as varying is going to be your identifying variable, it is not always required that you specify anything for &#8220;idvar&#8221;.</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1116code21'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p111621"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
</pre></td><td
class="code" id="p1116code21"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Example 2: Two identifying variables</span>
<span style="color: #080;">&gt;</span> kidshtwt <span style="color: #080;">=</span> read.<span style="">dta</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;http://www.ats.ucla.edu/stat/stata/modules/kidshtwt.dta&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> kidshtwt
  famid birth ht1 ht2 wt1 wt2
<span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.8</span> <span style="color: #ff0000;">3.4</span>  <span style="color: #ff0000;">19</span>  <span style="color: #ff0000;">28</span>
<span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">2.9</span> <span style="color: #ff0000;">3.8</span>  <span style="color: #ff0000;">21</span>  <span style="color: #ff0000;">28</span>
<span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">3</span> <span style="color: #ff0000;">2.2</span> <span style="color: #ff0000;">2.9</span>  <span style="color: #ff0000;">20</span>  <span style="color: #ff0000;">23</span>
<span style="color: #ff0000;">4</span>     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.0</span> <span style="color: #ff0000;">3.2</span>  <span style="color: #ff0000;">25</span>  <span style="color: #ff0000;">30</span>
<span style="color: #ff0000;">5</span>     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">1.8</span> <span style="color: #ff0000;">2.8</span>  <span style="color: #ff0000;">20</span>  <span style="color: #ff0000;">33</span>
<span style="color: #ff0000;">6</span>     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">3</span> <span style="color: #ff0000;">1.9</span> <span style="color: #ff0000;">2.4</span>  <span style="color: #ff0000;">22</span>  <span style="color: #ff0000;">33</span>
<span style="color: #ff0000;">7</span>     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.2</span> <span style="color: #ff0000;">3.3</span>  <span style="color: #ff0000;">22</span>  <span style="color: #ff0000;">28</span>
<span style="color: #ff0000;">8</span>     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">2.3</span> <span style="color: #ff0000;">3.4</span>  <span style="color: #ff0000;">20</span>  <span style="color: #ff0000;">30</span>
<span style="color: #ff0000;">9</span>     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">3</span> <span style="color: #ff0000;">2.1</span> <span style="color: #ff0000;">2.9</span>  <span style="color: #ff0000;">22</span>  <span style="color: #ff0000;">31</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Note the use of &quot;timevar&quot; to name the &quot;times&quot; column more appropriately.</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Also, we want to exclude the weight data from our reshape.</span>
<span style="color: #080;">&gt;</span> l.<span style="">kidsht</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">reshape</span><span style="color: #080;">&#40;</span>kidshtwt<span style="color: #080;">&#91;</span><span style="color: #080;">-</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">5</span>, <span style="color: #ff0000;">6</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span>, direction<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;long&quot;</span>, varying<span style="color: #080;">=</span><span style="color: #ff0000;">3</span><span style="color: #080;">:</span><span style="color: #ff0000;">4</span>, sep<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;&quot;</span>,
<span style="color: #080;">+</span>                    idvar<span style="color: #080;">=</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">2</span>, timevar<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;age&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># There are other ways to do that previous step. Here is one:</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;">#      &gt; l.kidsht = reshape(kidshtwt, direction=&quot;long&quot;, idvar=1:2,</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;">#      +                    varying=3:4, drop=5:6, sep=&quot;&quot;, timevar=&quot;age&quot;)</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Let's sort the data first by the family id then by the birth order</span>
<span style="color: #080;">&gt;</span> l.<span style="">kidsht</span><span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span>l.<span style="">kidsht</span>$famid, l.<span style="">kidsht</span>$birth<span style="color: #080;">&#41;</span>,<span style="color: #080;">&#93;</span>
      famid birth age  ht
1.1.1     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.8</span>
1.1.2     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">3.4</span>
1.2.1     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.9</span>
1.2.2     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">3.8</span>
1.3.1     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.2</span>
1.3.2     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">2.9</span>
2.1.1     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.0</span>
2.1.2     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">3.2</span>
2.2.1     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">1.8</span>
2.2.2     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">2.8</span>
2.3.1     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">1.9</span>
2.3.2     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">2.4</span>
3.1.1     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.2</span>
3.1.2     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">3.3</span>
3.2.1     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.3</span>
3.2.2     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">3.4</span>
3.3.1     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.1</span>
3.3.2     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">2.9</span></pre></td></tr></table></div><p>The third example at UCLA&#8217;s page is pretty straightforward. It uses the same data we just loaded in example 2, but we are reshaping all four measured variables.</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1116code22'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p111622"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td
class="code" id="p1116code22"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> l.<span style="">kidshtwt</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">reshape</span><span style="color: #080;">&#40;</span>kidshtwt, direction<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;long&quot;</span>, idvar<span style="color: #080;">=</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">2</span>,varying<span style="color: #080;">=</span><span style="color: #ff0000;">3</span><span style="color: #080;">:</span><span style="color: #ff0000;">6</span>,
<span style="color: #080;">+</span>                      sep<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;&quot;</span>, timevar<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;age&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> l.<span style="">kidshtwt</span><span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span>l.<span style="">kidshtwt</span>$famid, l.<span style="">kidshtwt</span>$birth<span style="color: #080;">&#41;</span>,<span style="color: #080;">&#93;</span>
      famid birth age  ht wt
1.1.1     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.8</span> <span style="color: #ff0000;">19</span>
1.1.2     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">3.4</span> <span style="color: #ff0000;">28</span>
1.2.1     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.9</span> <span style="color: #ff0000;">21</span>
1.2.2     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">3.8</span> <span style="color: #ff0000;">28</span>
1.3.1     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.2</span> <span style="color: #ff0000;">20</span>
1.3.2     <span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">2.9</span> <span style="color: #ff0000;">23</span>
2.1.1     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.0</span> <span style="color: #ff0000;">25</span>
2.1.2     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">3.2</span> <span style="color: #ff0000;">30</span>
2.2.1     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">1.8</span> <span style="color: #ff0000;">20</span>
2.2.2     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">2.8</span> <span style="color: #ff0000;">33</span>
2.3.1     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">1.9</span> <span style="color: #ff0000;">22</span>
2.3.2     <span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">2.4</span> <span style="color: #ff0000;">33</span>
3.1.1     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.2</span> <span style="color: #ff0000;">22</span>
3.1.2     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">1</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">3.3</span> <span style="color: #ff0000;">28</span>
3.2.1     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.3</span> <span style="color: #ff0000;">20</span>
3.2.2     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">2</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">3.4</span> <span style="color: #ff0000;">30</span>
3.3.1     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">2.1</span> <span style="color: #ff0000;">22</span>
3.3.2     <span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">3</span>   <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">2.9</span> <span style="color: #ff0000;">31</span></pre></td></tr></table></div><p>The fourth example was the most tricky one for me at first. In that example, the variables are not distinuished by &#8220;time&#8221; (numerically) but rather, by a character. As you can see, the variable names are &#8220;famid&#8221;, &#8220;named&#8221;, &#8220;incd&#8221;,  &#8220;namem&#8221;, and &#8220;incm&#8221; &#8212; in other words, income and name for dad (variable ending in &#8220;d&#8221;) and mom (variable ending in &#8220;m&#8221;) for each family.</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1116code23'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p111623"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td
class="code" id="p1116code23"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Example 3: Non-numeric identifiers for the variables</span>
<span style="color: #080;">&gt;</span> dadmomw <span style="color: #080;">=</span> read.<span style="">dta</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;http://www.ats.ucla.edu/stat/stata/modules/dadmomw.dta&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> dadmomw
  famid named  incd namem  incm
<span style="color: #ff0000;">1</span>     <span style="color: #ff0000;">1</span>  Bill <span style="color: #ff0000;">30000</span>  Bess <span style="color: #ff0000;">15000</span>
<span style="color: #ff0000;">2</span>     <span style="color: #ff0000;">2</span>   Art <span style="color: #ff0000;">22000</span>   Amy <span style="color: #ff0000;">18000</span>
<span style="color: #ff0000;">3</span>     <span style="color: #ff0000;">3</span>  Paul <span style="color: #ff0000;">25000</span>   Pat <span style="color: #ff0000;">50000</span>
<span style="color: #080;">&gt;</span> r.<span style="">dadmomw</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">reshape</span><span style="color: #080;">&#40;</span>dadmomw, direction<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;long&quot;</span>, idvar<span style="color: #080;">=</span><span style="color: #ff0000;">1</span>, varying<span style="color: #080;">=</span><span style="color: #ff0000;">2</span><span style="color: #080;">:</span><span style="color: #ff0000;">5</span>,
<span style="color: #080;">+</span>                     sep<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;&quot;</span>, v.<span style="">names</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;name&quot;</span>, <span style="color: #ff0000;">&quot;inc&quot;</span><span style="color: #080;">&#41;</span>,
<span style="color: #080;">+</span>                     timevar<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;dadmom&quot;</span>, times<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;dad&quot;</span>, <span style="color: #ff0000;">&quot;mom&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> r.<span style="">dadmomw</span><span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span>r.<span style="">dadmomw</span>$famid<span style="color: #080;">&#41;</span>,<span style="color: #080;">&#93;</span>
      famid dadmom  name  inc
<span style="color: #ff0000;">1</span>.<span style="">dad</span>     <span style="color: #ff0000;">1</span>    dad <span style="color: #ff0000;">30000</span> Bill
<span style="color: #ff0000;">1</span>.<span style="">mom</span>     <span style="color: #ff0000;">1</span>    mom <span style="color: #ff0000;">15000</span> Bess
<span style="color: #ff0000;">2</span>.<span style="">dad</span>     <span style="color: #ff0000;">2</span>    dad <span style="color: #ff0000;">22000</span>  Art
<span style="color: #ff0000;">2</span>.<span style="">mom</span>     <span style="color: #ff0000;">2</span>    mom <span style="color: #ff0000;">18000</span>  Amy
<span style="color: #ff0000;">3</span>.<span style="">dad</span>     <span style="color: #ff0000;">3</span>    dad <span style="color: #ff0000;">25000</span> Paul
<span style="color: #ff0000;">3</span>.<span style="">mom</span>     <span style="color: #ff0000;">3</span>    mom <span style="color: #ff0000;">50000</span>  Pat</pre></td></tr></table></div><p>Stata&#8217;s commands are certainly more direct (see below for what you would do for the last example in Stata). R&#8217;s commands sometimes tend to be a bit verbose, but in some ways, that might also help you remember what you&#8217;re doing. (I still don&#8217;t know what &#8220;i&#8221; and &#8220;j&#8221; in the Stata reshape commands stand for.) If you can afford the ~ $2,500 price tag, <a
href="http://ekonometrics.blogspot.com/2011/04/speeding-tickets-for-r-and-stata.html">Stata is</a> <a
href="http://ekonometrics.blogspot.com/2011/04/going-over-speed-limit.html">also faster</a>.</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p1116code24'); return false;">View Code</a> STATA</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p111624"><td
class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td
class="code" id="p1116code24"><pre class="stata" style="font-family:monospace;">use http://www.ats.ucla.edu/stat/stata/modules/dadmomw, clear
list
use http://www.ats.ucla.edu/stat/stata/modules/dadmomw, clear
list
reshape long name  inc, i(famid) j(dadmom) string
list</pre></td></tr></table></div> ]]></content:encoded> <wfw:commentRss>http://news.mrdwab.com/2011/04/18/reshaping-data-in-r-revisited/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Regular expressions in R</title><link>http://news.mrdwab.com/2011/03/08/regular-expressions-in-r/</link> <comments>http://news.mrdwab.com/2011/03/08/regular-expressions-in-r/#comments</comments> <pubDate>Tue, 08 Mar 2011 18:06:11 +0000</pubDate> <dc:creator>Ananda</dc:creator> <category><![CDATA[Geekiness]]></category> <category><![CDATA[Useless Knowledge]]></category> <category><![CDATA[data manipulation]]></category> <category><![CDATA[R]]></category> <category><![CDATA[regular expressions]]></category> <guid
isPermaLink="false">http://news.mrdwab.com/?p=916</guid> <description><![CDATA[In my last post (Sounds interesting. Is that a regular expression?), I showed a few things I had figured out recently related to regular expressions. By now, you have also figured out that I like figuring things out in R, and application of regular expressions is one of these things. Since R is scriptable, it [...]]]></description> <content:encoded><![CDATA[<p>In my last post (<a
href="http://news.mrdwab.com/2011/03/07/sounds-interesting-is-that-a-regular-expression/" title="Sounds interesting. Is that a regular expression?">Sounds interesting. Is that a regular expression?</a>), I showed a few things I had figured out recently related to regular expressions. By now, you have also figured out that I like figuring things out in R, and application of regular expressions is one of these things.</p><p><span
id="more-916"></span></p><p>Since R is scriptable, it is easy to put a series of regular expressions to work to get the results you need. Consider the following, which uses <a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/unprocessed-1.txt">this text file</a> as the input, and which gives us the same output as &#8220;Example 3&#8243; from my earlier post:</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p916code27'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p91627"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
</pre></td><td
class="code" id="p916code27"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> a <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">readLines</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;http://news.mrdwab.com/wp-content/uploads/2011/03/unprocessed-1.txt&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> b <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">gsub</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;^([01]:[ |0-9]+)$&quot;</span>, <span style="color: #ff0000;">&quot;&quot;</span>, a<span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> b <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">gsub</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;^([0-9]|[0-9-]+)<span style="color: #000099; font-weight: bold;">\\</span>.([0-9]{4,5})&quot;</span>, <span style="color: #ff0000;">&quot;&quot;</span>, b<span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> b <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">gsub</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;^([A-Z])$&quot;</span>, <span style="color: #ff0000;">&quot;&quot;</span>, b<span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> birthweight.<span style="">percentiles</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">matrix</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">scan</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">textConnection</span><span style="color: #080;">&#40;</span>b<span style="color: #080;">&#41;</span>, skip<span style="color: #080;">=</span><span style="color: #ff0000;">17</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">ncol</span><span style="color: #080;">=</span><span style="color: #ff0000;">12</span>, byrow<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
Read <span style="color: #ff0000;">156</span> items
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">colnames</span><span style="color: #080;">&#40;</span>birthweight.<span style="">percentiles</span><span style="color: #080;">&#41;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Month&quot;</span>, <span style="color: #0000FF; font-weight: bold;">scan</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">textConnection</span><span style="color: #080;">&#40;</span>b<span style="color: #080;">&#41;</span>,
<span style="color: #080;">+</span>                                       what<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;character&quot;</span>, skip<span style="color: #080;">=</span><span style="color: #ff0000;">5</span>, n<span style="color: #080;">=</span><span style="color: #ff0000;">11</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
Read <span style="color: #ff0000;">11</span> items
<span style="color: #080;">&gt;</span> birthweight.<span style="">percentiles</span>
      Month 1st 3rd 5th 15th 25th 50th 75th 85th 95th 97th 99th
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">0</span> <span style="color: #ff0000;">2.3</span> <span style="color: #ff0000;">2.4</span> <span style="color: #ff0000;">2.5</span>  <span style="color: #ff0000;">2.8</span>  <span style="color: #ff0000;">2.9</span>  <span style="color: #ff0000;">3.2</span>  <span style="color: #ff0000;">3.6</span>  <span style="color: #ff0000;">3.7</span>  <span style="color: #ff0000;">4.0</span>  <span style="color: #ff0000;">4.2</span>  <span style="color: #ff0000;">4.4</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">2</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">3.0</span> <span style="color: #ff0000;">3.2</span> <span style="color: #ff0000;">3.3</span>  <span style="color: #ff0000;">3.6</span>  <span style="color: #ff0000;">3.8</span>  <span style="color: #ff0000;">4.2</span>  <span style="color: #ff0000;">4.6</span>  <span style="color: #ff0000;">4.8</span>  <span style="color: #ff0000;">5.2</span>  <span style="color: #ff0000;">5.4</span>  <span style="color: #ff0000;">5.7</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">3</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">3.8</span> <span style="color: #ff0000;">4.0</span> <span style="color: #ff0000;">4.1</span>  <span style="color: #ff0000;">4.5</span>  <span style="color: #ff0000;">4.7</span>  <span style="color: #ff0000;">5.1</span>  <span style="color: #ff0000;">5.6</span>  <span style="color: #ff0000;">5.9</span>  <span style="color: #ff0000;">6.3</span>  <span style="color: #ff0000;">6.5</span>  <span style="color: #ff0000;">6.9</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">4</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">3</span> <span style="color: #ff0000;">4.4</span> <span style="color: #ff0000;">4.6</span> <span style="color: #ff0000;">4.7</span>  <span style="color: #ff0000;">5.1</span>  <span style="color: #ff0000;">5.4</span>  <span style="color: #ff0000;">5.8</span>  <span style="color: #ff0000;">6.4</span>  <span style="color: #ff0000;">6.7</span>  <span style="color: #ff0000;">7.2</span>  <span style="color: #ff0000;">7.4</span>  <span style="color: #ff0000;">7.8</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">5</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">4</span> <span style="color: #ff0000;">4.8</span> <span style="color: #ff0000;">5.1</span> <span style="color: #ff0000;">5.2</span>  <span style="color: #ff0000;">5.6</span>  <span style="color: #ff0000;">5.9</span>  <span style="color: #ff0000;">6.4</span>  <span style="color: #ff0000;">7.0</span>  <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.9</span>  <span style="color: #ff0000;">8.1</span>  <span style="color: #ff0000;">8.6</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">6</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">5</span> <span style="color: #ff0000;">5.2</span> <span style="color: #ff0000;">5.5</span> <span style="color: #ff0000;">5.6</span>  <span style="color: #ff0000;">6.1</span>  <span style="color: #ff0000;">6.4</span>  <span style="color: #ff0000;">6.9</span>  <span style="color: #ff0000;">7.5</span>  <span style="color: #ff0000;">7.8</span>  <span style="color: #ff0000;">8.4</span>  <span style="color: #ff0000;">8.7</span>  <span style="color: #ff0000;">9.2</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">7</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">6</span> <span style="color: #ff0000;">5.5</span> <span style="color: #ff0000;">5.8</span> <span style="color: #ff0000;">6.0</span>  <span style="color: #ff0000;">6.4</span>  <span style="color: #ff0000;">6.7</span>  <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.9</span>  <span style="color: #ff0000;">8.3</span>  <span style="color: #ff0000;">8.9</span>  <span style="color: #ff0000;">9.2</span>  <span style="color: #ff0000;">9.7</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">8</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">7</span> <span style="color: #ff0000;">5.8</span> <span style="color: #ff0000;">6.1</span> <span style="color: #ff0000;">6.3</span>  <span style="color: #ff0000;">6.7</span>  <span style="color: #ff0000;">7.0</span>  <span style="color: #ff0000;">7.6</span>  <span style="color: #ff0000;">8.3</span>  <span style="color: #ff0000;">8.7</span>  <span style="color: #ff0000;">9.4</span>  <span style="color: #ff0000;">9.6</span> <span style="color: #ff0000;">10.2</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">9</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">8</span> <span style="color: #ff0000;">6.0</span> <span style="color: #ff0000;">6.3</span> <span style="color: #ff0000;">6.5</span>  <span style="color: #ff0000;">7.0</span>  <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.9</span>  <span style="color: #ff0000;">8.6</span>  <span style="color: #ff0000;">9.0</span>  <span style="color: #ff0000;">9.7</span> <span style="color: #ff0000;">10.0</span> <span style="color: #ff0000;">10.6</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">10</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">9</span> <span style="color: #ff0000;">6.2</span> <span style="color: #ff0000;">6.6</span> <span style="color: #ff0000;">6.8</span>  <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.6</span>  <span style="color: #ff0000;">8.2</span>  <span style="color: #ff0000;">8.9</span>  <span style="color: #ff0000;">9.3</span> <span style="color: #ff0000;">10.1</span> <span style="color: #ff0000;">10.4</span> <span style="color: #ff0000;">11.0</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">11</span>,<span style="color: #080;">&#93;</span>    <span style="color: #ff0000;">10</span> <span style="color: #ff0000;">6.4</span> <span style="color: #ff0000;">6.8</span> <span style="color: #ff0000;">7.0</span>  <span style="color: #ff0000;">7.5</span>  <span style="color: #ff0000;">7.8</span>  <span style="color: #ff0000;">8.5</span>  <span style="color: #ff0000;">9.2</span>  <span style="color: #ff0000;">9.6</span> <span style="color: #ff0000;">10.4</span> <span style="color: #ff0000;">10.7</span> <span style="color: #ff0000;">11.3</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">12</span>,<span style="color: #080;">&#93;</span>    <span style="color: #ff0000;">11</span> <span style="color: #ff0000;">6.6</span> <span style="color: #ff0000;">7.0</span> <span style="color: #ff0000;">7.2</span>  <span style="color: #ff0000;">7.7</span>  <span style="color: #ff0000;">8.0</span>  <span style="color: #ff0000;">8.7</span>  <span style="color: #ff0000;">9.5</span>  <span style="color: #ff0000;">9.9</span> <span style="color: #ff0000;">10.7</span> <span style="color: #ff0000;">11.0</span> <span style="color: #ff0000;">11.7</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">13</span>,<span style="color: #080;">&#93;</span>    <span style="color: #ff0000;">12</span> <span style="color: #ff0000;">6.8</span> <span style="color: #ff0000;">7.1</span> <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.9</span>  <span style="color: #ff0000;">8.2</span>  <span style="color: #ff0000;">8.9</span>  <span style="color: #ff0000;">9.7</span> <span style="color: #ff0000;">10.2</span> <span style="color: #ff0000;">11.0</span> <span style="color: #ff0000;">11.3</span> <span style="color: #ff0000;">12.0</span></pre></td></tr></table></div><p>Similarly, we can replicate the &#8220;bonus session&#8221; (which is based <a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/unprocessed-5.txt">on this text file</a>) as follows:</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p916code28'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p91628"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
</pre></td><td
class="code" id="p916code28"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> n <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">readLines</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;http://news.mrdwab.com/wp-content/uploads/2011/03/unprocessed-5.txt&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> org.<span style="">name</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">gsub</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;^([0-9]<span style="color: #000099; font-weight: bold;">\\</span>. )(.*) <span style="color: #000099; font-weight: bold;">\\</span>(.*&quot;</span>, <span style="color: #ff0000;">&quot;'<span style="color: #000099; font-weight: bold;">\\</span>2'&quot;</span>, n<span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> org.<span style="">name</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">gsub</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;^[0-9].*&quot;</span>, <span style="color: #ff0000;">&quot;&quot;</span>, org.<span style="">name</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> orgs <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">rep</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">scan</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">textConnection</span><span style="color: #080;">&#40;</span>org.<span style="">name</span><span style="color: #080;">&#41;</span>, what<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;character&quot;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">16</span>, <span style="color: #ff0000;">5</span>, <span style="color: #ff0000;">1</span>, <span style="color: #ff0000;">1</span>, <span style="color: #ff0000;">2</span>, <span style="color: #ff0000;">4</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
Read <span style="color: #ff0000;">6</span> items
<span style="color: #080;">&gt;</span> ss <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">gsub</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;^([0-9]<span style="color: #000099; font-weight: bold;">\\</span>. )(.*)<span style="color: #000099; font-weight: bold;">\\</span>(([0-9]+)<span style="color: #000099; font-weight: bold;">\\</span>)( )&quot;</span>, <span style="color: #ff0000;">&quot;&quot;</span>, n<span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> ss <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">gsub</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;^([0-9]+) (.*) (.*)&quot;</span>, <span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">\\</span>2,<span style="color: #000099; font-weight: bold;">\\</span>3&quot;</span>, ss<span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> states.<span style="">sites</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">read.<span style="">csv</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">textConnection</span><span style="color: #080;">&#40;</span>ss<span style="color: #080;">&#41;</span>, header<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> operation.<span style="">areas</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span>orgs, states.<span style="">sites</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">colnames</span><span style="color: #080;">&#40;</span>operation.<span style="">areas</span><span style="color: #080;">&#41;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Organization&quot;</span>, <span style="color: #ff0000;">&quot;State&quot;</span>, <span style="color: #ff0000;">&quot;Sites&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> operation.<span style="">areas</span>
           Organization             State Sites
<span style="color: #ff0000;">1</span>        Organization M    Andhra Pradesh     <span style="color: #ff0000;">7</span>
<span style="color: #ff0000;">2</span>        Organization M Arunachal Pradesh     <span style="color: #ff0000;">8</span>
<span style="color: #ff0000;">3</span>        Organization M             Assam     <span style="color: #ff0000;">8</span>
<span style="color: #ff0000;">4</span>        Organization M             Bihar    <span style="color: #ff0000;">24</span>
<span style="color: #ff0000;">5</span>        Organization M       Chattisgarh     <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">6</span>        Organization M               Goa    <span style="color: #ff0000;">15</span>
<span style="color: #ff0000;">7</span>        Organization M           Gujarat    <span style="color: #ff0000;">19</span>
<span style="color: #ff0000;">8</span>        Organization M           Haryana     <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">9</span>        Organization M  Himachal Pradesh    <span style="color: #ff0000;">14</span>
<span style="color: #ff0000;">10</span>       Organization M Jammu and Kashmir     <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">11</span>       Organization M         Jharkhand     <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">12</span>       Organization M         Karnataka     <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">13</span>       Organization M            Kerala     <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">14</span>       Organization M    Madhya Pradesh     <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">15</span>       Organization M       Maharashtra     <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">16</span>       Organization M           Manipur     <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">17</span>         Foundation X         Meghalaya    <span style="color: #ff0000;">29</span>
<span style="color: #ff0000;">18</span>         Foundation X           Mizoram    <span style="color: #ff0000;">10</span>
<span style="color: #ff0000;">19</span>         Foundation X          Nagaland     <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">20</span>         Foundation X            Odisha    <span style="color: #ff0000;">12</span>
<span style="color: #ff0000;">21</span>         Foundation X        Puducherry    <span style="color: #ff0000;">14</span>
<span style="color: #ff0000;">22</span>                NGO Z            Punjab     <span style="color: #ff0000;">8</span>
<span style="color: #ff0000;">23</span>           Government         Rajasthan    <span style="color: #ff0000;">16</span>
<span style="color: #ff0000;">24</span> Research Institute A            Sikkim     <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">25</span> Research Institute A        Tamil Nadu     <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">26</span>       Organization <span style="color: #0000FF; font-weight: bold;">C</span>           Tripura     <span style="color: #ff0000;">8</span>
<span style="color: #ff0000;">27</span>       Organization <span style="color: #0000FF; font-weight: bold;">C</span>     Uttar Pradesh    <span style="color: #ff0000;">15</span>
<span style="color: #ff0000;">28</span>       Organization <span style="color: #0000FF; font-weight: bold;">C</span>       Uttarakhand     <span style="color: #ff0000;">1</span>
<span style="color: #ff0000;">29</span>       Organization <span style="color: #0000FF; font-weight: bold;">C</span>       West Bengal    <span style="color: #ff0000;">12</span></pre></td></tr></table></div><p>Notice the use of <code>readLines</code> to import the text file, <code>gsub</code> to declare the search and replace expressions, <code>textConnection</code> to treat an R object as a text file, and the escaped backslashes. The other steps are more or less the same as they would be if we were using a good text editor.</p><p>By the way, the inspiration for this came from <a
href="http://sas-and-r.blogspot.com/2011/02/example-827-using-regular-expressions.html">here</a>.</p> ]]></content:encoded> <wfw:commentRss>http://news.mrdwab.com/2011/03/08/regular-expressions-in-r/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Sounds interesting. Is that a regular expression?</title><link>http://news.mrdwab.com/2011/03/07/sounds-interesting-is-that-a-regular-expression/</link> <comments>http://news.mrdwab.com/2011/03/07/sounds-interesting-is-that-a-regular-expression/#comments</comments> <pubDate>Mon, 07 Mar 2011 05:44:09 +0000</pubDate> <dc:creator>Ananda</dc:creator> <category><![CDATA[(all categories)]]></category> <category><![CDATA[Geekiness]]></category> <category><![CDATA[Useless Knowledge]]></category> <category><![CDATA[R]]></category> <category><![CDATA[regular expressions]]></category> <category><![CDATA[tutorial]]></category> <guid
isPermaLink="false">http://news.mrdwab.com/?p=884</guid> <description><![CDATA[I&#8217;ve been meaning to learn how to use regular expressions for quite some time now, but just never seemed to get around to doing so. The other night, I decided to take a stab at them though, and over the past few days, I&#8217;ve sort of managed to learn a few tricks. Some of these [...]]]></description> <content:encoded><![CDATA[<p>I&#8217;ve been meaning to learn how to use regular expressions for quite some time now, but just never seemed to get around to doing so. The other night, I decided to take a stab at them though, and over the past few days, I&#8217;ve sort of managed to learn a few tricks. Some of these might seem unnecessary, particularly since the examples comprise relatively small chunks of text. But, hopefully you can also see the application of the same techniques for larger text files. In some of the examples, I&#8217;ve also included how it might help with preparing your data for use with a program like R. For all of these examples, I&#8217;ve used Geany as my text editor. I suggest you use a good text editor like <a
href="http://www.geany.org/" target="_blank">Geany</a> or <a
href="http://notepad-plus-plus.org/" target="_blank">Notepad++</a> too.</p><p><span
id="more-884"></span></p><h2>Example 1: Changing numeric date formats</h2><p>Imagine we&#8217;re given a file containing dates in the form of m(m)/d(d)/yyyy and someone gives us totally arbitrary instructions to change it to Y:yyyy (tab) M:mm (tab) D:dd (really, I can&#8217;t tell you who or why.)</p><p>Below is our starting text. Note that some of the months and days have only one digit, while some single digit dates are entered with a preceding zero.</p><pre>
09/05/1978
12/11/2003
11/9/2010
3/13/2001
</pre><p>My solution:</p><table
width="100%" cellspacing="12px"><tr><th>Search for this</th><th>Replace with this</th><th
width="250px">Why?</th></tr><tr><td><code>^([0-9])/</code></td><td><code>0\1/</code></td><td
width="250px">To find single digit months and fill with a preceding zero.</td></tr><tr><td><code>/([0-9])/</code></td><td><code>/0\1/</code></td><td
width="250px">To find single digit days and fill with a preceding zero.</td></tr><tr><td><code>^([0-9]+)/([0-9]+)/([0-9]+)</code></td><td><code>Y:\3\tM:\2\tD:\1</code></td><td
width="250px">Separates a date into three sections that we are able to rearrange as we see fit.</td></tr></table><p>The sections in the regular expression search pattern enclosed by parentheses () become &#8220;references&#8221; that we can refer to by their location. In other words, notice that the last regular expression search pattern had three pairs of parentheses. The first one <code>([0-9]+)</code> searches for a number. The <code>+</code> says to keep going until you find the next search item, which is a forward slash. The <code>^</code> before the first item indicates that this should be matched at the start of a line.</p><p>If you followed the instructions correctly, you should get the following:</p><pre>
Y:1978	M:05	D:09
Y:2003	M:11	D:12
Y:2010	M:09	D:11
Y:2001	M:13	D:03
</pre><h2>Example 2: Changing names around</h2><p>The same crazy person that wanted that odd date format also wanted us to change this list of names from being &#8220;First-name Last-name&#8221; to being &#8220;Last-name, First-initial&#8221;. We could do this manually, but why should we? Here&#8217;s what we start with:</p><pre>
Ethan Nakata
Stepanie Foutz
Nikole Pritt
Lesley Ramsay
Lucienne Anderson
Ardith Guo
Kassie Roloff
Kathy Edie
Kellee Rowse
Effie Bensinger
Bethel Gravel
Kathaleen Kovac
Candance Clauss
Sherell Dobrowolski
Kym Thurmon
Xiomara Tocci
Brice Tallon
Natalya Bouldin
Jacki Parise
Evonne Mun
</pre><table
width="100%" cellspacing="12px"><tr><th>Search for this</th><th>Replace with this</th><th
width="250px">Why?</th></tr><tr><td><code>^([A-Z])([a-z]+) ([A-Za-z]+)</code></td><td><code>\3, \1.</code></td><td
width="250px">The first set of parentheses selects just the first initial, the second set, the rest of the first name, the third set, the entire last name. In the replace box, we remove the second reference, and insert a comma and space in between reference three and one, and a period after reference 1.</td></tr></table><p>Here&#8217;s the result:</p><pre>
Nakata, E.
Foutz, S.
Pritt, N.
Ramsay, L.
Anderson, L.
Guo, A.
Roloff, K.
Edie, K.
Rowse, K.
Bensinger, E.
Gravel, B.
Kovac, K.
Clauss, C.
Dobrowolski, S.
Thurmon, K.
Tocci, X.
Tallon, B.
Bouldin, N.
Parise, J.
Mun, E.
</pre><h2>Example 3 &#8211; Import really ugly data cut and pasted from a PDF into R</h2><p>Someone thought that it would be a good idea to copy a part of <a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/WFA_girls_0_5_percentiles.pdf">this table</a> and send it to you <a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/unprocessed-1.txt">as a text file</a> (again, who knows why&#8230;). You want to import the data into R and use it. Can you do it efficiently? You&#8217;re not actually interested in everything. You are most interested in the &#8220;Month&#8221; column and the values in the columns titled &#8220;1st&#8221; to &#8220;99th&#8221;.</p><div
class="wp_codebox_msgheader wp_codebox_hide"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p884code36'); return false;">View Code</a> TXT</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p88436"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
</pre></td><td
class="code" id="p884code36"><pre class="txt" style="font-family:monospace;">Year: Month
Month
L
M
S
1st
3rd
5th
15th
25th
50th
75th
85th
95th
97th
99th
0: 0
0
0.3809
3.2322
0.14171
2.3
2.4
2.5
2.8
2.9
3.2
3.6
3.7
4.0
4.2
4.4
0: 1
1
0.1714
4.1873
0.13724
3.0
3.2
3.3
3.6
3.8
4.2
4.6
4.8
5.2
5.4
5.7
0: 2
2
0.0962
5.1282
0.13000
3.8
4.0
4.1
4.5
4.7
5.1
5.6
5.9
6.3
6.5
6.9
0: 3
3
0.0402
5.8458
0.12619
4.4
4.6
4.7
5.1
5.4
5.8
6.4
6.7
7.2
7.4
7.8
0: 4
4
-0.0050
6.4237
0.12402
4.8
5.1
5.2
5.6
5.9
6.4
7.0
7.3
7.9
8.1
8.6
0: 5
5
-0.0430
6.8985
0.12274
5.2
5.5
5.6
6.1
6.4
6.9
7.5
7.8
8.4
8.7
9.2
0: 6
6
-0.0756
7.2970
0.12204
5.5
5.8
6.0
6.4
6.7
7.3
7.9
8.3
8.9
9.2
9.7
0: 7
7
-0.1039
7.6422
0.12178
5.8
6.1
6.3
6.7
7.0
7.6
8.3
8.7
9.4
9.6
10.2
0: 8
8
-0.1288
7.9487
0.12181
6.0
6.3
6.5
7.0
7.3
7.9
8.6
9.0
9.7
10.0
10.6
0: 9
9
-0.1507
8.2254
0.12199
6.2
6.6
6.8
7.3
7.6
8.2
8.9
9.3
10.1
10.4
11.0
0:10
10
-0.1700
8.4800
0.12223
6.4
6.8
7.0
7.5
7.8
8.5
9.2
9.6
10.4
10.7
11.3
0:11
11
-0.1872
8.7192
0.12247
6.6
7.0
7.2
7.7
8.0
8.7
9.5
9.9
10.7
11.0
11.7
1: 0
12
-0.2024
8.9481
0.12268
6.8
7.1
7.3
7.9
8.2
8.9
9.7
10.2
11.0
11.3
12.0</pre></td></tr></table></div><p>This can actually be done with two regular expression statements and two or three lines of code in R. First, the regular expressions:</p><table
width="100%" cellspacing="12px"><tr><th>Search for this</th><th>Replace with this</th><th
width="250px">Why?</th></tr><tr><td><code>^([01]:[ |0-9]+)</code></td><td>Nothing</td><td
width="250px">This removes the Year: Month column.</td></tr><tr><td><code>^([0-9]|[0-9-]+)\.([0-9]{4,5})</code></td><td>Nothing</td><td
width="250px">This removes the numeric values for the &#8220;L&#8221;, &#8220;M&#8221;, and &#8220;S&#8221; columns.</td></tr><tr><td><code>^([A-Z]{1})\r\n</code></td><td>Nothing</td><td
width="250px">This removes the actual text &#8220;L&#8221;, &#8220;M&#8221;, and &#8220;S&#8221; from lines 3-5 of the unprocessed file.</td></tr></table><p>If you did this correctly, you should end up with a text file like <a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/unprocessed-1b.txt">the one linked here</a>.</p><div
class="wp_codebox_msgheader wp_codebox_hide"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p884code37'); return false;">View Code</a> TXT</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p88437"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
</pre></td><td
class="code" id="p884code37"><pre class="txt" style="font-family:monospace;">Year: Month
Month
1st
3rd
5th
15th
25th
50th
75th
85th
95th
97th
99th
&nbsp;
0
&nbsp;
&nbsp;
&nbsp;
2.3
2.4
2.5
2.8
2.9
3.2
3.6
3.7
4.0
4.2
4.4
&nbsp;
1
&nbsp;
&nbsp;
&nbsp;
3.0
3.2
3.3
3.6
3.8
4.2
4.6
4.8
5.2
5.4
5.7
&nbsp;
2
&nbsp;
&nbsp;
&nbsp;
3.8
4.0
4.1
4.5
4.7
5.1
5.6
5.9
6.3
6.5
6.9
&nbsp;
3
&nbsp;
&nbsp;
&nbsp;
4.4
4.6
4.7
5.1
5.4
5.8
6.4
6.7
7.2
7.4
7.8
&nbsp;
4
&nbsp;
&nbsp;
&nbsp;
4.8
5.1
5.2
5.6
5.9
6.4
7.0
7.3
7.9
8.1
8.6
&nbsp;
5
&nbsp;
&nbsp;
&nbsp;
5.2
5.5
5.6
6.1
6.4
6.9
7.5
7.8
8.4
8.7
9.2
&nbsp;
6
&nbsp;
&nbsp;
&nbsp;
5.5
5.8
6.0
6.4
6.7
7.3
7.9
8.3
8.9
9.2
9.7
&nbsp;
7
&nbsp;
&nbsp;
&nbsp;
5.8
6.1
6.3
6.7
7.0
7.6
8.3
8.7
9.4
9.6
10.2
&nbsp;
8
&nbsp;
&nbsp;
&nbsp;
6.0
6.3
6.5
7.0
7.3
7.9
8.6
9.0
9.7
10.0
10.6
&nbsp;
9
&nbsp;
&nbsp;
&nbsp;
6.2
6.6
6.8
7.3
7.6
8.2
8.9
9.3
10.1
10.4
11.0
&nbsp;
10
&nbsp;
&nbsp;
&nbsp;
6.4
6.8
7.0
7.5
7.8
8.5
9.2
9.6
10.4
10.7
11.3
&nbsp;
11
&nbsp;
&nbsp;
&nbsp;
6.6
7.0
7.2
7.7
8.0
8.7
9.5
9.9
10.7
11.0
11.7
&nbsp;
12
&nbsp;
&nbsp;
&nbsp;
6.8
7.1
7.3
7.9
8.2
8.9
9.7
10.2
11.0
11.3
12.0</pre></td></tr></table></div><p>Do a &#8220;select all&#8221; and switch over to R. We&#8217;re going to first create a matrix with the values from line 15 to the end, and then create the column names for the matrix from lines 2 through 13. Do the following magic:</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p884code38'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p88438"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td
class="code" id="p884code38"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> birthweight.<span style="">percentiles</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">matrix</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">scan</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;clipboard&quot;</span>, skip<span style="color: #080;">=</span><span style="color: #ff0000;">14</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">ncol</span><span style="color: #080;">=</span><span style="color: #ff0000;">12</span>, byrow<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
Read <span style="color: #ff0000;">156</span> items
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">colnames</span><span style="color: #080;">&#40;</span>birthweight.<span style="">percentiles</span><span style="color: #080;">&#41;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">scan</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;clipboard&quot;</span>, what<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;character&quot;</span>,
<span style="color: #080;">+</span>                                          skip<span style="color: #080;">=</span><span style="color: #ff0000;">1</span>, n<span style="color: #080;">=</span><span style="color: #ff0000;">12</span><span style="color: #080;">&#41;</span>
Read <span style="color: #ff0000;">12</span> items
<span style="color: #080;">&gt;</span> birthweight.<span style="">percentiles</span>
      Month 1st 3rd 5th 15th 25th 50th 75th 85th 95th 97th 99th
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">0</span> <span style="color: #ff0000;">2.3</span> <span style="color: #ff0000;">2.4</span> <span style="color: #ff0000;">2.5</span>  <span style="color: #ff0000;">2.8</span>  <span style="color: #ff0000;">2.9</span>  <span style="color: #ff0000;">3.2</span>  <span style="color: #ff0000;">3.6</span>  <span style="color: #ff0000;">3.7</span>  <span style="color: #ff0000;">4.0</span>  <span style="color: #ff0000;">4.2</span>  <span style="color: #ff0000;">4.4</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">2</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">1</span> <span style="color: #ff0000;">3.0</span> <span style="color: #ff0000;">3.2</span> <span style="color: #ff0000;">3.3</span>  <span style="color: #ff0000;">3.6</span>  <span style="color: #ff0000;">3.8</span>  <span style="color: #ff0000;">4.2</span>  <span style="color: #ff0000;">4.6</span>  <span style="color: #ff0000;">4.8</span>  <span style="color: #ff0000;">5.2</span>  <span style="color: #ff0000;">5.4</span>  <span style="color: #ff0000;">5.7</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">3</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">2</span> <span style="color: #ff0000;">3.8</span> <span style="color: #ff0000;">4.0</span> <span style="color: #ff0000;">4.1</span>  <span style="color: #ff0000;">4.5</span>  <span style="color: #ff0000;">4.7</span>  <span style="color: #ff0000;">5.1</span>  <span style="color: #ff0000;">5.6</span>  <span style="color: #ff0000;">5.9</span>  <span style="color: #ff0000;">6.3</span>  <span style="color: #ff0000;">6.5</span>  <span style="color: #ff0000;">6.9</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">4</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">3</span> <span style="color: #ff0000;">4.4</span> <span style="color: #ff0000;">4.6</span> <span style="color: #ff0000;">4.7</span>  <span style="color: #ff0000;">5.1</span>  <span style="color: #ff0000;">5.4</span>  <span style="color: #ff0000;">5.8</span>  <span style="color: #ff0000;">6.4</span>  <span style="color: #ff0000;">6.7</span>  <span style="color: #ff0000;">7.2</span>  <span style="color: #ff0000;">7.4</span>  <span style="color: #ff0000;">7.8</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">5</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">4</span> <span style="color: #ff0000;">4.8</span> <span style="color: #ff0000;">5.1</span> <span style="color: #ff0000;">5.2</span>  <span style="color: #ff0000;">5.6</span>  <span style="color: #ff0000;">5.9</span>  <span style="color: #ff0000;">6.4</span>  <span style="color: #ff0000;">7.0</span>  <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.9</span>  <span style="color: #ff0000;">8.1</span>  <span style="color: #ff0000;">8.6</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">6</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">5</span> <span style="color: #ff0000;">5.2</span> <span style="color: #ff0000;">5.5</span> <span style="color: #ff0000;">5.6</span>  <span style="color: #ff0000;">6.1</span>  <span style="color: #ff0000;">6.4</span>  <span style="color: #ff0000;">6.9</span>  <span style="color: #ff0000;">7.5</span>  <span style="color: #ff0000;">7.8</span>  <span style="color: #ff0000;">8.4</span>  <span style="color: #ff0000;">8.7</span>  <span style="color: #ff0000;">9.2</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">7</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">6</span> <span style="color: #ff0000;">5.5</span> <span style="color: #ff0000;">5.8</span> <span style="color: #ff0000;">6.0</span>  <span style="color: #ff0000;">6.4</span>  <span style="color: #ff0000;">6.7</span>  <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.9</span>  <span style="color: #ff0000;">8.3</span>  <span style="color: #ff0000;">8.9</span>  <span style="color: #ff0000;">9.2</span>  <span style="color: #ff0000;">9.7</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">8</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">7</span> <span style="color: #ff0000;">5.8</span> <span style="color: #ff0000;">6.1</span> <span style="color: #ff0000;">6.3</span>  <span style="color: #ff0000;">6.7</span>  <span style="color: #ff0000;">7.0</span>  <span style="color: #ff0000;">7.6</span>  <span style="color: #ff0000;">8.3</span>  <span style="color: #ff0000;">8.7</span>  <span style="color: #ff0000;">9.4</span>  <span style="color: #ff0000;">9.6</span> <span style="color: #ff0000;">10.2</span>
 <span style="color: #080;">&#91;</span><span style="color: #ff0000;">9</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">8</span> <span style="color: #ff0000;">6.0</span> <span style="color: #ff0000;">6.3</span> <span style="color: #ff0000;">6.5</span>  <span style="color: #ff0000;">7.0</span>  <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.9</span>  <span style="color: #ff0000;">8.6</span>  <span style="color: #ff0000;">9.0</span>  <span style="color: #ff0000;">9.7</span> <span style="color: #ff0000;">10.0</span> <span style="color: #ff0000;">10.6</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">10</span>,<span style="color: #080;">&#93;</span>     <span style="color: #ff0000;">9</span> <span style="color: #ff0000;">6.2</span> <span style="color: #ff0000;">6.6</span> <span style="color: #ff0000;">6.8</span>  <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.6</span>  <span style="color: #ff0000;">8.2</span>  <span style="color: #ff0000;">8.9</span>  <span style="color: #ff0000;">9.3</span> <span style="color: #ff0000;">10.1</span> <span style="color: #ff0000;">10.4</span> <span style="color: #ff0000;">11.0</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">11</span>,<span style="color: #080;">&#93;</span>    <span style="color: #ff0000;">10</span> <span style="color: #ff0000;">6.4</span> <span style="color: #ff0000;">6.8</span> <span style="color: #ff0000;">7.0</span>  <span style="color: #ff0000;">7.5</span>  <span style="color: #ff0000;">7.8</span>  <span style="color: #ff0000;">8.5</span>  <span style="color: #ff0000;">9.2</span>  <span style="color: #ff0000;">9.6</span> <span style="color: #ff0000;">10.4</span> <span style="color: #ff0000;">10.7</span> <span style="color: #ff0000;">11.3</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">12</span>,<span style="color: #080;">&#93;</span>    <span style="color: #ff0000;">11</span> <span style="color: #ff0000;">6.6</span> <span style="color: #ff0000;">7.0</span> <span style="color: #ff0000;">7.2</span>  <span style="color: #ff0000;">7.7</span>  <span style="color: #ff0000;">8.0</span>  <span style="color: #ff0000;">8.7</span>  <span style="color: #ff0000;">9.5</span>  <span style="color: #ff0000;">9.9</span> <span style="color: #ff0000;">10.7</span> <span style="color: #ff0000;">11.0</span> <span style="color: #ff0000;">11.7</span>
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">13</span>,<span style="color: #080;">&#93;</span>    <span style="color: #ff0000;">12</span> <span style="color: #ff0000;">6.8</span> <span style="color: #ff0000;">7.1</span> <span style="color: #ff0000;">7.3</span>  <span style="color: #ff0000;">7.9</span>  <span style="color: #ff0000;">8.2</span>  <span style="color: #ff0000;">8.9</span>  <span style="color: #ff0000;">9.7</span> <span style="color: #ff0000;">10.2</span> <span style="color: #ff0000;">11.0</span> <span style="color: #ff0000;">11.3</span> <span style="color: #ff0000;">12.0</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># Optional: If you need or prefer a data frame instead of a matrix, run:</span>
<span style="color: #080;">&gt;</span> <span style="color: #228B22;"># birthweight.percentiles = as.data.frame(birthweight.percentiles)</span></pre></td></tr></table></div><p>The first line of R code scans in the data values and fills them into a matrix with 12 columns, filling by row from left to right. The second line adds the column names. Notice how we were able to use &#8220;skip&#8221; and &#8220;n&#8221; to select the values we were interested in at each stage.</p><h2>Example 4 &#8211; WordPress.com&#8217;s monthly stats table is ugly</h2><p>Really it is! Here&#8217;s a screenshot. It looks pretty, right?</p><div
id="attachment_905" class="wp-caption aligncenter" style="width: 410px"><a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/WordPress-Stats.png" rel="lightbox[884]"><img
src="http://news.mrdwab.com/wp-content/uploads/2011/03/WordPress-Stats-400x218.png" alt="" title="WordPress Stats" width="400" height="218" class="size-medium wp-image-905" /></a><p
class="wp-caption-text">WordPress.com&#039;s monthly stats view. Pretty online, but not easy to copy and paste into other applications.</p></div><p>But, if you copy it into a an Excel spreadsheet, you get this:</p><p><iframe
width='500' height='300' frameborder='0' src='https://spreadsheets.google.com/pub?hl=en&#038;hl=en&#038;key=0An2f7Ho_4e0fdEFVN3Ezc1h6V2hQNm5pLU1mQlRMQWc&#038;single=true&#038;gid=0&#038;output=html&#038;widget=true'></iframe></p><p>Ugh. How are we supposed to work with this?</p><p>Well, it depends on if you copied from Excel to your text editor, or if you copied directly from the WordPress stats screen to a text editor. I&#8217;ll cover both scenarios below.</p><p><strong><em>Copying from WordPress to Excel to your text editor</em></strong></p><p>If you copied from WordPress to Excel to your text editor, you&#8217;d end up with a text file <a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/unprocessed-2.txt">like the one linked here</a>.</p><div
class="wp_codebox_msgheader wp_codebox_hide"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p884code39'); return false;">View Code</a> TXT</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p88439"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
</pre></td><td
class="code" id="p884code39"><pre class="txt" style="font-family:monospace;">24-Jan	25-Jan	26-Jan	27-Jan	28-Jan	29-Jan	30-Jan	201	29
&nbsp;
&nbsp;
19	27	14	32	73	25	11
&nbsp;
31-Jan	1-Feb	2-Feb	3-Feb	4-Feb	5-Feb	6-Feb	302	43	50.25%
&nbsp;
&nbsp;
23	15	63	49	52	29	71
&nbsp;
7-Feb	8-Feb	9-Feb	10-Feb	11-Feb	12-Feb	13-Feb	369	53	22.19%
&nbsp;
&nbsp;
59	35	24	88	29	96	38
&nbsp;
14-Feb	15-Feb	16-Feb	17-Feb	18-Feb	19-Feb	20-Feb	376	54	1.90%
&nbsp;
&nbsp;
115	96	60	41	15	24	25
&nbsp;
21-Feb	22-Feb	23-Feb	24-Feb	25-Feb	26-Feb	27-Feb	291	42	-22.61%
&nbsp;
&nbsp;
51	29	57	23	79	21	31
&nbsp;
28-Feb	1-Mar	2-Mar	3-Mar	4-Mar			187	40	-3.18%
&nbsp;
&nbsp;
25	54	36	46	26</pre></td></tr></table></div><p>This is easy to clean up and import into R. You can simply search for <code>^([0-9]{1,2}-[A-Z].*)</code>, replace it with nothing, copy what&#8217;s left and read it into R by using something like <code>&gt; data = scan("clipboard")</code>.</p><p>Notice that we sacrifice the dates here though.</p><p><strong><em>Copying from WordPress to your text editor</em></strong></p><p>If you copied from WordPress to your text editor, you&#8217;d end up with a text file <a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/unprocessed-3.txt">like the one linked here</a>.</p><div
class="wp_codebox_msgheader wp_codebox_hide"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p884code40'); return false;">View Code</a> TXT</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p88440"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
</pre></td><td
class="code" id="p884code40"><pre class="txt" style="font-family:monospace;">Jan 24
&nbsp;
19
Jan 25
&nbsp;
27
Jan 26
&nbsp;
14
Jan 27
&nbsp;
32
Jan 28
&nbsp;
73
Jan 29
&nbsp;
25
Jan 30
&nbsp;
11
201	29
Jan 31
&nbsp;
23
Feb 1
&nbsp;
15
Feb 2
&nbsp;
63
Feb 3
&nbsp;
49
Feb 4
&nbsp;
52
Feb 5
&nbsp;
29
Feb 6
&nbsp;
71
302	43	+50.25%
Feb 7
&nbsp;
59
Feb 8
&nbsp;
35
Feb 9
&nbsp;
24
Feb 10
&nbsp;
88
Feb 11
&nbsp;
29
Feb 12
&nbsp;
96
Feb 13
&nbsp;
38
369	53	+22.19%
Feb 14
&nbsp;
115
Feb 15
&nbsp;
96
Feb 16
&nbsp;
60
Feb 17
&nbsp;
41
Feb 18
&nbsp;
15
Feb 19
&nbsp;
24
Feb 20
&nbsp;
25
376	54	+1.90%
Feb 21
&nbsp;
51
Feb 22
&nbsp;
29
Feb 23
&nbsp;
57
Feb 24
&nbsp;
23
Feb 25
&nbsp;
79
Feb 26
&nbsp;
21
Feb 27
&nbsp;
31
291	42	-22.61%
Feb 28
&nbsp;
25
Mar 1
&nbsp;
54
Mar 2
&nbsp;
36
Mar 3
&nbsp;
46
Mar 4
&nbsp;
26
187	40	-3.18%</pre></td></tr></table></div><p>From this file, we can either search and replace in a way that we retain the dates or we drop them and keep just the stats.</p><p><strong><em><u>Keep the dates</u></em></strong></p><table
width="100%" cellspacing="12px"><tr><th>Search for this</th><th>Replace with this</th><th
width="250px">Why?</th></tr><tr><td><code>\r\n\r\n</code></td><td><code>\t</code></td><td
width="250px">This isn&#8217;t actually with regular expressions, but with &#8220;escape sequences&#8221;. We&#8217;re replacing two line spaces with a tab. This will result in a format like &#8220;Date tab number-of-visits&#8221;. It will also make it easier for us to do the next step of search and replace.</td></tr><tr><td><code>^([0-9].*)</code></td><td>Nothing</td><td
width="250px">In our previous step, we ended up with a convenient scenario where the lines we&#8217;re interested in start with a character, and all the other lines start with a number. We can now easily remove those lines.</td></tr></table><p>You can now easily copy this to your clipboard and read this into R using <code>&gt; data = read.table("clipboard", header=F)</code>. Don&#8217;t forget to add <code>header=F</code> or else R will think the first line is the column names.</p><p><strong><em>Just gimme the data</em></strong></p><p>Do everything you did before, but add one more regular expression search and replace. Search for <code>^([A-Z].*)\t(.*)</code> and replace it with <code>\2</code>. This will create a list of just the data that can be read into R by using something like <code>&gt; data = scan("clipboard")</code>.</p><h2>Example 5 &#8211; Merged cells are a pain&#8230;.</h2><p>Your friend gives you <a
href="http://news.mrdwab.com/wp-content/uploads/2011/03/REGEX-TEST.pdf">this PDF</a> with a beautiful table in it, and you need to extract the data. When you copy it into Microsoft Word, OpenOffice Writer, or whatever you prefer, it looks terrible, like the unformatted text below:</p><pre>
1. Organization M (117) 1 Andhra Pradesh 7
2 Arunachal Pradesh 8
3 Assam 8
4 Bihar 24
5 Chattisgarh 2
6 Goa 15
7 Gujarat 19
8 Haryana 4
9 Himachal Pradesh 14
10 Jammu and Kashmir 2
11 Jharkhand 2
12 Karnataka 4
13 Kerala 2
14 Madhya Pradesh 2
15 Maharashtra 2
16 Manipur 2
2. Foundation X (69) 17 Meghalaya 29
18 Mizoram 10
19 Nagaland 4
20 Odisha 12
21 Puducherry 14
3. NGO Z (8) 22 Punjab 8
4. Government (16) 23 Rajasthan 16
5. Research Institute A (8) 24 Sikkim 4
25 Tamil Nadu 4
6. Organization C (36) 26 Tripura 8
27 Uttar Pradesh 15
28 Uttarakhand 1
29 West Bengal 12
</pre><p>This is actually also pretty easy to fix with the following set of regular expressions.</p><table
width="100%" cellspacing="12px"><tr><th>Search for this</th><th>Replace with this</th><th
width="250px">Why?</th></tr><tr><td><code>^([0-9]\. )(.*)\(([0-9]+)\)( )</code></td><td><code>\t\2\t\r\n</code></td><td
width="250px">We&#8217;re trying to match just the name of the organization and put each organization on a line by itself, preceded and followed by a tab.</td></tr><tr><td><code>^([0-9]+) (.*) (.*)</code></td><td><code>\1\t\2\t\3</code></td><td
width="250px">We want to break up the state information into three parts and insert a tab to indicate each column.</td></tr></table><p>After doing these two steps, your data should now look like this:</p><pre>
	Organization M
1	Andhra Pradesh	7
2	Arunachal Pradesh	8
3	Assam	8
4	Bihar	24
5	Chattisgarh	2
6	Goa	15
7	Gujarat	19
8	Haryana	4
9	Himachal Pradesh	14
10	Jammu and Kashmir	2
11	Jharkhand	2
12	Karnataka	4
13	Kerala	2
14	Madhya Pradesh	2
15	Maharashtra	2
16	Manipur	2
	Foundation X
17	Meghalaya	29
18	Mizoram	10
19	Nagaland	4
20	Odisha	12
21	Puducherry	14
	NGO Z
22	Punjab	8
	Government
23	Rajasthan	16
	Research Institute A
24	Sikkim	4
25	Tamil Nadu	4
	Organization C
26	Tripura	8
27	Uttar Pradesh	15
28	Uttarakhand	1
29	West Bengal	12
</pre><p>This is now a tab delimited file, so it can easily be imported into any program that uses such files, or can be pasted into a word processor and converted into a table by using the &#8220;text to table&#8221; feature that any decent word processor should have.</p><h2>Bonus session</h2><p>Imagine (for whatever reason) you wanted the table from Example 5 to be used in R. Here&#8217;s how you can do it. First, count how many sites each organization is working in and make a note of that. In this case, it&#8217;s <code>16, 5, 1, 1, 2, 4</code>. Then, do the following:</p><table
width="100%" cellspacing="12px"><tr><th>Search for this</th><th>Replace with this</th><th
width="250px">Why?</th></tr><tr><td><code>^([0-9]\. )(.*) \(.*</code></td><td><code>"\2"</code></td><td
width="250px">We want just the organization&#8217;s name, and this is the first step to help us with that. Be sure to put the replace reference in quotes.</td></tr><tr><td><code>^[0-9].*</code></td><td>Nothing</td><td
width="250px">We now have just the names of each organization (with a lot of blank lines in between).</td></tr></table><p>Copy whatever is in your text editor, switch over to R and enter the following:</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p884code41'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p88441"><td
class="line_numbers"><pre>1
</pre></td><td
class="code" id="p884code41"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> orgs <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">rep</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">scan</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;clipboard&quot;</span>, what<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;character&quot;</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">16</span>, <span style="color: #ff0000;">5</span>, <span style="color: #ff0000;">1</span>, <span style="color: #ff0000;">1</span>, <span style="color: #ff0000;">2</span>, <span style="color: #ff0000;">4</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div><p>Switch back to your text editor, and revert the file to its original state (just do a couple of undos). Then, do what you did in Example 5 except:</p><ul><li>In the first step, instead of replacing with <code>\t\2\t\r\n</code>, replace with nothing.</li><li>In the next step, instead of replacing with <code>\1\t\2\t\3</code>, replace with <code>\2\t\3</code>.</li></ul><p> Again, copy whatever&#8217;s in your text file, switch back over to R, and do the following:</p><div
class="wp_codebox_msgheader"><span
class="right"><sup><a
href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span
style="color: #99cc00">?</span></a></sup></span><span
class="left"><a
href="javascript:;" onclick="javascript:showCodeTxt('p884code42'); return false;">View Code</a> RSPLUS</span><div
class="codebox_clear"></div></div><div
class="wp_codebox"><table><tr
id="p88442"><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
</pre></td><td
class="code" id="p884code42"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> organizations <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">read.<span style="">delim</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;clipboard&quot;</span>, header<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span>orgs, organizations<span style="color: #080;">&#41;</span>
                   orgs                V1 V2
<span style="color: #ff0000;">1</span>        Organization M    Andhra Pradesh  <span style="color: #ff0000;">7</span>
<span style="color: #ff0000;">2</span>        Organization M Arunachal Pradesh  <span style="color: #ff0000;">8</span>
<span style="color: #ff0000;">3</span>        Organization M             Assam  <span style="color: #ff0000;">8</span>
<span style="color: #ff0000;">4</span>        Organization M             Bihar <span style="color: #ff0000;">24</span>
<span style="color: #ff0000;">5</span>        Organization M       Chattisgarh  <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">6</span>        Organization M               Goa <span style="color: #ff0000;">15</span>
<span style="color: #ff0000;">7</span>        Organization M           Gujarat <span style="color: #ff0000;">19</span>
<span style="color: #ff0000;">8</span>        Organization M           Haryana  <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">9</span>        Organization M  Himachal Pradesh <span style="color: #ff0000;">14</span>
<span style="color: #ff0000;">10</span>       Organization M Jammu and Kashmir  <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">11</span>       Organization M         Jharkhand  <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">12</span>       Organization M         Karnataka  <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">13</span>       Organization M            Kerala  <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">14</span>       Organization M    Madhya Pradesh  <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">15</span>       Organization M       Maharashtra  <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">16</span>       Organization M           Manipur  <span style="color: #ff0000;">2</span>
<span style="color: #ff0000;">17</span>         Foundation X         Meghalaya <span style="color: #ff0000;">29</span>
<span style="color: #ff0000;">18</span>         Foundation X           Mizoram <span style="color: #ff0000;">10</span>
<span style="color: #ff0000;">19</span>         Foundation X          Nagaland  <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">20</span>         Foundation X            Odisha <span style="color: #ff0000;">12</span>
<span style="color: #ff0000;">21</span>         Foundation X        Puducherry <span style="color: #ff0000;">14</span>
<span style="color: #ff0000;">22</span>                NGO Z            Punjab  <span style="color: #ff0000;">8</span>
<span style="color: #ff0000;">23</span>           Government         Rajasthan <span style="color: #ff0000;">16</span>
<span style="color: #ff0000;">24</span> Research Institute A            Sikkim  <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">25</span> Research Institute A        Tamil Nadu  <span style="color: #ff0000;">4</span>
<span style="color: #ff0000;">26</span>       Organization <span style="color: #0000FF; font-weight: bold;">C</span>           Tripura  <span style="color: #ff0000;">8</span>
<span style="color: #ff0000;">27</span>       Organization <span style="color: #0000FF; font-weight: bold;">C</span>     Uttar Pradesh <span style="color: #ff0000;">15</span>
<span style="color: #ff0000;">28</span>       Organization <span style="color: #0000FF; font-weight: bold;">C</span>       Uttarakhand  <span style="color: #ff0000;">1</span>
<span style="color: #ff0000;">29</span>       Organization <span style="color: #0000FF; font-weight: bold;">C</span>       West Bengal <span style="color: #ff0000;">12</span></pre></td></tr></table></div> ]]></content:encoded> <wfw:commentRss>http://news.mrdwab.com/2011/03/07/sounds-interesting-is-that-a-regular-expression/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>
