Group DateTime rows in datatable by DateTime - C#

Go To StackoverFlow.com

4

I have a LARGE datatable (500k-1m rows), without going into detail this is a requirement as the end user needs/wants to be able to see all of the data. This is on a local server so bandwidth etc are not concerns for me.

I have a DateTime field in the DataTable which I need to group, let me explain what I mean by grouping... It's probably not what you think I mean (from looking at the other questions on here!).

        var table = new DataTable();
        table.Columns.Add("EventTime", typeof(DateTime));
        table.Columns.Add("Result", typeof(String));
        table.Columns.Add("ValueOne", typeof(Int32));
        table.Columns.Add("ValueTwo", typeof(Int32));
        table.Rows.Add("2012-02-06 12:41:45.190", "A", "7", "0");
        table.Rows.Add("2012-02-06 12:45:41.190", "B", "3", "89");
        table.Rows.Add("2012-02-06 12:59:41.190", "C", "1", "0");
        table.Rows.Add("2012-02-06 13:41:41.190", "D", "0", "28");
        table.Rows.Add("2012-02-06 17:41:41.190", "E", "0", "37");
        table.Rows.Add("2012-02-07 12:41:45.190", "F", "48", "23");

I would expect the above table to be grouped so that I get a sum of the "ValueOne" column, and an average of the "ValueTwo" column. I need the grouping to be a little bit flexible so that I can specify that I want grouping by minutes (only the first and last rows would be grouped, the rest would just provide their values), or by days (all but the last row would be grouped into a single row), etc.

I've tried this a few times but I'm getting no where. My LINQ knowledge isn't great, but I thought I'd be able to do this!

Note: The DataTable is already on the machine for calculations/views which cannot be changed, so saying "Stop being an idiot, filter in SQL!!!" is a valid answer, just useless to me! :-D

Also, in case you missed it in the title, I need this in C# - I'm working with .NET 4.0...

Thanks in advance, assuming you decide to help! :-)

2012-04-03 20:25
by Faraday
Even though you have the data loaded locally already, and you don't care about performance, it's worth pointing out that doing a LINQ query against a Linq-to-SQL or Entity context will be a lot easier from a code perspective - StriplingWarrior 2012-04-03 20:31
@StriplingWarrior: Why is a LINQ-To-SQL query easier than a LINQ-To-DataSet query - Rango 2012-04-03 20:33
@TimSchmelter: Because Datasets have no structure that's known at compile-time. You have to do contortions to cast values and use indexers rather than just using simple property-getting syntax - StriplingWarrior 2012-04-03 20:38
Stripling - Could you expand on what you just said. I'm not against improving performance! It's just that the few times I ask questions like this people usually shout at me saying I'm loading too much data and I'm an idiot! :) Please do explain, I'd love to actually understand what you just said - Faraday 2012-04-03 20:40
@StriplingWarrior: Why is row.Field<DateTime>("EventTime") a contortion or an index? (not to mention a typed DataSet - Rango 2012-04-03 20:41
@TimSchmelter: row.Field<DateTime>("EventTime") feels like I'm doing contortions compared to event.EventTime. It requires both a cast and a "magic string" value. It's an indexer because I'm asking the row for the value at index "EventTime", and it's not type-safe because if you changed the type of the "EventTime" field, the compiler wouldn't complain. I'm not clear on what a typed DataSet has to do with it, but I'm open to be enlightened - StriplingWarrior 2012-04-03 20:58
@user1311339: Nevermind: As I was answering I realized that implementing this in one of those frameworks would be a little more complicated because you're trying to get individual pieces off of a DateTime value, which would require some special method calls - StriplingWarrior 2012-04-03 21:03
@StriplingWarrior: You're right with the first part, LINQ-To-DataSet is not a replacement for (LINQ-to-)SQL or LINQ-To-Entities, but it's not more difficult as you've first claimed and for certain requirements an absolutely viable approach(f.e. only few DataTables already on server, synchronization of different dbms even across multiple servers and so on). A typed DataSet doesn't need an indexer and no casting and is aware of the datamodel, hence bypasses all of your mentioned disadvantages but it's an extension of a weakly typed DataSet(that's the relation) - Rango 2012-04-03 21:30


5

The other three answers are close, but as you pointed out they group events that occurred in the same second of the minute, not events that happened in the same second, which is what you want. Try this:

var query = from r in table.Rows.Cast<DataRow>()
        let eventTime = (DateTime)r[0]
        group r by new DateTime(eventTime.Year, eventTime.Month, eventTime.Day, eventTime.Hour, eventTime.Minute, eventTime.Second)
            into g
        select new {
                g.Key,
                Sum = g.Sum(r => (int)r[2]),
                Average = g.Average(r => (int)r[3])
            };

You can adjust what information you pass to the DateTime constructor to group by different time parts.

2012-04-03 20:55
by David Nelson
Just to be clear, are you saying that their answers will group 03/04/2012 10:00:01 with 11/10/2099 10:00:01 - Faraday 2012-04-03 21:05
I get "Cannot resolve symbol let" - Any ideas, please bear with me.. I'm only a beginner! : - Faraday 2012-04-03 21:17
Yes that is what I am saying - David Nelson 2012-04-03 21:32
Sorry, I mistyped the from clause. I have updated the code - David Nelson 2012-04-03 21:33
You are you there? :- - Faraday 2012-04-03 21:34
Sorry, you were a little quicker than I. Now I get cannot resolve symbol groupby... I'm so sorry for asking for so much help... I promise to stay quiet for a while after this one - Faraday 2012-04-03 21:36
My fault for not paying attention. I mixed method syntax and comprehension syntax. I updated again, and checked that it compiles this time - David Nelson 2012-04-03 22:16
How would I get the full EventTime rather than the truncated one? By this I mean if I select Hour, then I don't want the int for the hour, I want the full date, down to the hour.. - Faraday 2012-04-03 22:21
Construct a new DateTime that omits everything after the hour, just like my example does - David Nelson 2012-04-03 22:23
No need to apologise, I'm just amazed that you're still helping me! I really do appreciate this soooo much! :) I'm working on it too, so I'll paste back what I come up with to see whether you approve since you clearly know a lot more than I do - Faraday 2012-04-03 22:23
I tried to move this to chat, but apparently I need a better reputation! :-S How can I access a single value for the row? I can see how you do average/sum and get the key (still don't fully understand how that's mydatetime value...) but let's say the datatable has another value "Name", how would I get the value of "Name", I assume that this would also need to be grouped, I'll keep trying, but if you have a quick moment I'd appreciate the tip - Faraday 2012-04-03 22:40
Key is the value of the group by expression. If you want to select additional properties, they need to be included by making the group by expression an anonymous type: new { EventTime = new DateTime(...), Name = (string)r[1] - David Nelson 2012-04-03 22:43


1

The only thing you need to change is the property you want to group by.

var query = from x in DataSource
            group x by x.EventTime.Minute into x
            select new
            {
              Unit = x.Key,
              SumValueOne = x.Sum(y => y.ValueOne),
              AverageValueTwo = x.Average(y => y.ValueTwo), 
            };
2012-04-03 20:32
by Aducci
I really like the look of this answer... Just I just cut/paste it into visual studio to see exactly what you were doing and it doesn't like DataSource being a DataTable, and if I put DataTable.Rows then it complains more! Hopefully you know what I mean.. - Faraday 2012-04-03 20:44


1

Something like this should work:

DataTable dt = GetDataTableResults();

var results = from row in dt.AsEnumerable()
              group row by new { EventDate = row.Field<DateTime>("EventTime").Date } into rowgroup
              select new
              {
                  EventDate = rowgroup.Key.EventDate,
                  ValueOne = rowgroup.Sum(r => r.Field<int>("ValueOne")),
                  ValueTwo = rowgroup.Average(r => r.Field<decimal>("ValueTwo"))
              };  
2012-04-03 20:38
by James Johnson
I'm probably missing something VERY obvious, but how does it know what part of the datetime to group by - Faraday 2012-04-03 20:41
Well, in this example it's grouping on the date only, ignoring the time. If you need different grouping criteria, you can change row.Field<DateTime>("EventTime").Date to whatever you need - James Johnson 2012-04-03 20:48
Wouldn't this ignore the year/month/day and just say "The second is the same so I'll group it" if I chose to groupby second? Or does it take the other fields into account also - Faraday 2012-04-03 22:02
How would I get the full EventTime rather than the truncated one? By this I mean if I select Hour, then I don't want the int for the hour, I want the full date, down to the hour.. - Faraday 2012-04-03 22:21
I would format it as a string and group on the string. For a format you can use "MM/dd/yyyy HH" (sorry, doing this on my phone - James Johnson 2012-04-03 22:47


0

Here's what your baseline code could look like:

var query = table.Rows.Cast<DataRow>()
    .GroupBy(r => ((DateTime)r[0]).Second)
    .Select(g => new
                 {
                    g.Key, 
                    Sum = g.Sum(r => (int)r[2]),
                    Average = g.Average(r => (int)r[3])
                 });

To add flexibility, you could have something like this:

IEnumerable<IGrouping<object, DataRow>> Group(IEnumerable<DataRow> rows, GroupType groupType)
{
    // switch case would be preferable, but you get the idea.
    if(groupType == GroupType.Minutes) return rows.GroupBy(r => ((object)((DateTime)r[0]).Minute));
    if(groupType == GroupType.Seconds) return rows.GroupBy(r => ((object)((DateTime)r[0]).Second));
    ...
}

var baseQuery = table.Rows.Cast<DataRow>();
var grouped = Group(baseQuery, groupType);
var query = grouped
    .Select(g => new
                 {
                    g.Key, 
                    Sum = g.Sum(r => (int)r[2]),
                    Average = g.Average(r => (int)r[3])
                 });
2012-04-03 20:38
by StriplingWarrior
Would that not ignore the other parts of the date completely - Faraday 2012-04-03 20:42
What I mean is, wouldn't this ignore the day and just say "The second is the same so I'll group it"? Also, I don't think this is the "doing a LINQ query against a Linq-to-SQL or Entity context" answer you suggested.. - Faraday 2012-04-03 20:48
@user1311339: That was just intended to give you an idea of where to start. See my update for how to change what you group by based on an argument. - StriplingWarrior 2012-04-03 20:49
Oh wow... OK, one last question for you.... What data type is Query now, and how can I make it a DataTable (there is a charting control we use which needs a DataTable :-S Sorry for bugging you, and thank you so much for your advice already - Faraday 2012-04-03 20:58
@user1311339: query is now an IEnumerable<> of an anonymous type that has Key, Sum, and Average properties. You'd have to create a datatable out of it yourself. And yes, the way I've implemented this would only be useful for finding out which months tend to be busiest, e.g., whereas you'll need to combine my strategy with David Nelson's GroupBy structure to actually group by month the way it sounds like you want to - StriplingWarrior 2012-04-03 21:10