文章目录
- Grouping和Aggregating
-
- 数据清洗
-
- 时间序列数据
-
- 读入写入数据
Grouping和Aggregating
df = pd.read_csv('data/survey_results_public.csv',index_col='Respondent')
pd.set_option('display.max_columns', 85)
pd.set_option('display.max_rows', 85)
df.head(3)
| MainBranch | Hobbyist | OpenSourcer | OpenSource | Employment | Country | Student | EdLevel | UndergradMajor | EduOther | OrgSize | DevType | YearsCode | Age1stCode | YearsCodePro | CareerSat | JobSat | MgrIdiot | MgrMoney | MgrWant | JobSeek | LastHireDate | LastInt | FizzBuzz | JobFactors | ResumeUpdate | CurrencySymbol | CurrencyDesc | CompTotal | CompFreq | ConvertedComp | WorkWeekHrs | WorkPlan | WorkChallenge | WorkRemote | WorkLoc | ImpSyn | CodeRev | CodeRevHrs | UnitTests | PurchaseHow | PurchaseWhat | LanguageWorkedWith | LanguageDesireNextYear | DatabaseWorkedWith | DatabaseDesireNextYear | PlatformWorkedWith | PlatformDesireNextYear | WebFrameWorkedWith | WebFrameDesireNextYear | MiscTechWorkedWith | MiscTechDesireNextYear | DevEnviron | OpSys | Containers | BlockchainOrg | BlockchainIs | BetterLife | ITperson | OffOn | SocialMedia | Extraversion | ScreenName | SOVisit1st | SOVisitFreq | SOVisitTo | SOFindAnswer | SOTimeSaved | SOHowMuchTime | SOAccount | SOPartFreq | SOJobs | EntTeams | SOComm | WelcomeChange | SONewContent | Age | Gender | Trans | Sexuality | Ethnicity | Dependents | SurveyLength | SurveyEase |
|---|
| Respondent | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|---|
| 1 | I am a student who is learning to code | Yes | Never | The quality of OSS and closed source software ... | Not employed, and not looking for work | United Kingdom | No | Primary/elementary school | NaN | Taught yourself a new language, framework, or ... | NaN | NaN | 4 | 10 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | HTML/CSS;Java;JavaScript;Python | C;C++;C#;Go;HTML/CSS;Java;JavaScript;Python;SQL | SQLite | MySQL | MacOS;Windows | Android;Arduino;Windows | Django;Flask | Flask;jQuery | Node.js | Node.js | IntelliJ;Notepad++;PyCharm | Windows | I do not use containers | NaN | NaN | Yes | Fortunately, someone else has that title | Yes | Twitter | Online | Username | 2017 | A few times per month or weekly | Find answers to specific questions;Learn how t... | 3-5 times per week | Stack Overflow was much faster | 31-60 minutes | No | NaN | No, I didn't know that Stack Overflow had a jo... | No, and I don't know what those are | Neutral | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 14.0 | Man | No | Straight / Heterosexual | NaN | No | Appropriate in length | Neither easy nor difficult |
|---|
| 2 | I am a student who is learning to code | No | Less than once per year | The quality of OSS and closed source software ... | Not employed, but looking for work | Bosnia and Herzegovina | Yes, full-time | Secondary school (e.g. American high school, G... | NaN | Taken an online course in programming or softw... | NaN | Developer, desktop or enterprise applications;... | NaN | 17 | NaN | NaN | NaN | NaN | NaN | NaN | I am actively looking for a job | I've never had a job | NaN | NaN | Financial performance or funding status of the... | Something else changed (education, award, medi... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | C++;HTML/CSS;Python | C++;HTML/CSS;JavaScript;SQL | NaN | MySQL | Windows | Windows | Django | Django | NaN | NaN | Atom;PyCharm | Windows | I do not use containers | NaN | Useful across many domains and could change ma... | Yes | Yes | Yes | Instagram | Online | Username | 2017 | Daily or almost daily | Find answers to specific questions;Learn how t... | 3-5 times per week | Stack Overflow was much faster | 11-30 minutes | Yes | A few times per month or weekly | No, I knew that Stack Overflow had a job board... | No, and I don't know what those are | Yes, somewhat | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 19.0 | Man | No | Straight / Heterosexual | NaN | No | Appropriate in length | Neither easy nor difficult |
|---|
| 3 | I am not primarily a developer, but I write co... | Yes | Never | The quality of OSS and closed source software ... | Employed full-time | Thailand | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Web development or web design | Taught yourself a new language, framework, or ... | 100 to 499 employees | Designer;Developer, back-end;Developer, front-... | 3 | 22 | 1 | Slightly satisfied | Slightly satisfied | Not at all confident | Not sure | Not sure | I’m not actively looking, but I am open to new... | 1-2 years ago | Interview with people in peer roles | No | Languages, frameworks, and other technologies ... | I was preparing for a job search | THB | Thai baht | 23000.0 | Monthly | 8820.0 | 40.0 | There's no schedule or spec; I work on what se... | Distracting work environment;Inadequate access... | Less than once per month / Never | Home | Average | No | NaN | No, but I think we should | Not sure | I have little or no influence | HTML/CSS | Elixir;HTML/CSS | PostgreSQL | PostgreSQL | NaN | NaN | NaN | Other(s): | NaN | NaN | Vim;Visual Studio Code | Linux-based | I do not use containers | NaN | NaN | Yes | Yes | Yes | Reddit | In real life (in person) | Username | 2011 | A few times per week | Find answers to specific questions;Learn how t... | 6-10 times per week | They were about the same | NaN | Yes | Less than once per month or monthly | Yes | No, I've heard of them, but I am not part of a... | Neutral | Just as welcome now as I felt last year | Tech meetups or events in your area;Courses on... | 28.0 | Man | No | Straight / Heterosexual | NaN | Yes | Appropriate in length | Neither easy nor difficult |
|---|
df['ConvertedComp'].head(10)
Respondent
1 NaN
2 NaN
3 8820.0
4 61000.0
5 NaN
6 366420.0
7 NaN
8 NaN
9 95179.0
10 13293.0
Name: ConvertedComp, dtype: float64
聚合函数
df['ConvertedComp'].median()
57287.0
df.median()
D:\Anaconda\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
"""Entry point for launching an IPython kernel.
CompTotal 62000.0
ConvertedComp 57287.0
WorkWeekHrs 40.0
CodeRevHrs 4.0
Age 29.0
dtype: float64
df.describe()
| CompTotal | ConvertedComp | WorkWeekHrs | CodeRevHrs | Age |
|---|
| count | 5.594500e+04 | 5.582300e+04 | 64503.000000 | 49790.000000 | 79210.000000 |
|---|
| mean | 5.519014e+11 | 1.271107e+05 | 42.127197 | 5.084308 | 30.336699 |
|---|
| std | 7.331926e+13 | 2.841523e+05 | 37.287610 | 5.513931 | 9.178390 |
|---|
| min | 0.000000e+00 | 0.000000e+00 | 1.000000 | 0.000000 | 1.000000 |
|---|
| 25% | 2.000000e+04 | 2.577750e+04 | 40.000000 | 2.000000 | 24.000000 |
|---|
| 50% | 6.200000e+04 | 5.728700e+04 | 40.000000 | 4.000000 | 29.000000 |
|---|
| 75% | 1.200000e+05 | 1.000000e+05 | 44.750000 | 6.000000 | 35.000000 |
|---|
| max | 1.000000e+16 | 2.000000e+06 | 4850.000000 | 99.000000 | 99.000000 |
|---|
df['ConvertedComp'].count()#count不考虑Nan
55823
df['Hobbyist']
Respondent
1 Yes
2 No
3 Yes
4 No
5 Yes
...
88377 Yes
88601 No
88802 No
88816 No
88863 Yes
Name: Hobbyist, Length: 88883, dtype: object
df['Hobbyist'].value_counts()
Yes 71257
No 17626
Name: Hobbyist, dtype: int64
df['SocialMedia']
Respondent
1 Twitter
2 Instagram
3 Reddit
4 Reddit
5 Facebook
...
88377 YouTube
88601 NaN
88802 NaN
88816 NaN
88863 WhatsApp
Name: SocialMedia, Length: 88883, dtype: object
schema_df = pd.read_csv('data/survey_results_schema.csv',index_col='Column')
schema_df.loc['SocialMedia','QuestionText']
'What social media site do you use the most?'
df['SocialMedia'].value_counts()
Reddit 14374
YouTube 13830
WhatsApp 13347
Facebook 13178
Twitter 11398
Instagram 6261
I don't use social media 5554
LinkedIn 4501
WeChat 微信 667
Snapchat 628
VK ВКонта́кте 603
Weibo 新浪微博 56
Youku Tudou 优酷 21
Hello 19
Name: SocialMedia, dtype: int64
df['SocialMedia'].value_counts(normalize=True)#百分比
Reddit 0.170233
YouTube 0.163791
WhatsApp 0.158071
Facebook 0.156069
Twitter 0.134988
Instagram 0.074150
I don't use social media 0.065777
LinkedIn 0.053306
WeChat 微信 0.007899
Snapchat 0.007437
VK ВКонта́кте 0.007141
Weibo 新浪微博 0.000663
Youku Tudou 优酷 0.000249
Hello 0.000225
Name: SocialMedia, dtype: float64
分组函数
df['Country'].value_counts()
United States 20949
India 9061
Germany 5866
United Kingdom 5737
Canada 3395
...
Tonga 1
Timor-Leste 1
North Korea 1
Brunei Darussalam 1
Chad 1
Name: Country, Length: 179, dtype: int64
country_grp = df.groupby(['Country'])
country_grp.get_group('United States')
| MainBranch | Hobbyist | OpenSourcer | OpenSource | Employment | Country | Student | EdLevel | UndergradMajor | EduOther | OrgSize | DevType | YearsCode | Age1stCode | YearsCodePro | CareerSat | JobSat | MgrIdiot | MgrMoney | MgrWant | JobSeek | LastHireDate | LastInt | FizzBuzz | JobFactors | ResumeUpdate | CurrencySymbol | CurrencyDesc | CompTotal | CompFreq | ConvertedComp | WorkWeekHrs | WorkPlan | WorkChallenge | WorkRemote | WorkLoc | ImpSyn | CodeRev | CodeRevHrs | UnitTests | PurchaseHow | PurchaseWhat | LanguageWorkedWith | LanguageDesireNextYear | DatabaseWorkedWith | DatabaseDesireNextYear | PlatformWorkedWith | PlatformDesireNextYear | WebFrameWorkedWith | WebFrameDesireNextYear | MiscTechWorkedWith | MiscTechDesireNextYear | DevEnviron | OpSys | Containers | BlockchainOrg | BlockchainIs | BetterLife | ITperson | OffOn | SocialMedia | Extraversion | ScreenName | SOVisit1st | SOVisitFreq | SOVisitTo | SOFindAnswer | SOTimeSaved | SOHowMuchTime | SOAccount | SOPartFreq | SOJobs | EntTeams | SOComm | WelcomeChange | SONewContent | Age | Gender | Trans | Sexuality | Ethnicity | Dependents | SurveyLength | SurveyEase |
|---|
| Respondent | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|---|
| 4 | I am a developer by profession | No | Never | The quality of OSS and closed source software ... | Employed full-time | United States | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Computer science, computer engineering, or sof... | Taken an online course in programming or softw... | 100 to 499 employees | Developer, full-stack | 3 | 16 | Less than 1 year | Very satisfied | Slightly satisfied | Very confident | No | Not sure | I am not interested in new job opportunities | Less than a year ago | Write code by hand (e.g., on a whiteboard);Int... | No | Languages, frameworks, and other technologies ... | I was preparing for a job search | USD | United States dollar | 61000.0 | Yearly | 61000.0 | 80.0 | There's no schedule or spec; I work on what se... | NaN | Less than once per month / Never | Home | A little below average | No | NaN | No, but I think we should | Developers typically have the most influence o... | I have little or no influence | C;C++;C#;Python;SQL | C;C#;JavaScript;SQL | MySQL;SQLite | MySQL;SQLite | Linux;Windows | Linux;Windows | NaN | NaN | .NET | .NET | Eclipse;Vim;Visual Studio;Visual Studio Code | Windows | I do not use containers | Not at all | Useful for decentralized currency (i.e., Bitcoin) | Yes | SIGH | Yes | Reddit | In real life (in person) | Username | 2014 | Daily or almost daily | Find answers to specific questions;Pass the ti... | 1-2 times per week | Stack Overflow was much faster | 31-60 minutes | Yes | Less than once per month or monthly | Yes | No, and I don't know what those are | No, not really | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 22.0 | Man | No | Straight / Heterosexual | White or of European descent | No | Appropriate in length | Easy |
|---|
| 13 | I am a developer by profession | Yes | Less than once a month but more than once per ... | OSS is, on average, of HIGHER quality than pro... | Employed full-time | United States | No | Master’s degree (MA, MS, M.Eng., MBA, etc.) | Computer science, computer engineering, or sof... | Taken an online course in programming or softw... | 10 to 19 employees | Data or business analyst;Database administrato... | 17 | 11 | 8 | Very satisfied | Very satisfied | NaN | NaN | NaN | I am not interested in new job opportunities | 3-4 years ago | Complete a take-home project;Interview with pe... | Yes | Languages, frameworks, and other technologies ... | I was preparing for a job search | USD | United States dollar | 90000.0 | Yearly | 90000.0 | 40.0 | There is a schedule and/or spec (made by me or... | Meetings;Non-work commitments (parenting, scho... | All or almost all the time (I'm full-time remote) | Home | A little above average | Yes, because I see value in code review | 5.0 | No, but I think we should | Developers and management have nearly equal in... | I have a great deal of influence | Bash/Shell/PowerShell;HTML/CSS;JavaScript;PHP;... | Bash/Shell/PowerShell;HTML/CSS;JavaScript;Rust... | Couchbase;DynamoDB;Firebase;MySQL | Firebase;MySQL;Redis | Android;AWS;Docker;IBM Cloud or Watson;iOS;Lin... | Android;AWS;Docker;IBM Cloud or Watson;Linux;S... | Angular/Angular.js;ASP.NET;Express;jQuery;Vue.js | Express;Vue.js | Node.js;Xamarin | Node.js;TensorFlow | Vim;Visual Studio;Visual Studio Code;Xcode | Windows | Development;Testing;Production | Not at all | Useful for decentralized currency (i.e., Bitcoin) | Yes | Yes | Yes | Twitter | In real life (in person) | Username | 2011 | Multiple times per day | Find answers to specific questions | More than 10 times per week | Stack Overflow was much faster | 11-30 minutes | Yes | Less than once per month or monthly | Yes | No, I've heard of them, but I am not part of a... | Neutral | Somewhat more welcome now than last year | Tech articles written by other developers;Cour... | 28.0 | Man | No | Straight / Heterosexual | White or of European descent | Yes | Appropriate in length | Easy |
|---|
| 22 | I am a developer by profession | Yes | Less than once per year | OSS is, on average, of HIGHER quality than pro... | Employed full-time | United States | No | Some college/university study without earning ... | NaN | Taken an online course in programming or softw... | 10,000 or more employees | Data or business analyst;Designer;Developer, b... | 35 | 12 | 18 | Slightly satisfied | Very dissatisfied | Somewhat confident | No | No | I’m not actively looking, but I am open to new... | More than 4 years ago | Interview with people in senior / management r... | No | Industry that I'd be working in;Financial perf... | I had a negative experience or interaction at ... | USD | United States dollar | 103000.0 | Yearly | 103000.0 | 40.0 | There is a schedule and/or spec (made by me or... | Being tasked with non-development work;Meeting... | Less than half the time, but at least one day ... | Home | Average | No | NaN | No, but I think we should | The CTO, CIO, or other management purchase new... | I have little or no influence | Bash/Shell/PowerShell;C++;HTML/CSS;JavaScript;... | Bash/Shell/PowerShell;C++;HTML/CSS;JavaScript;... | Elasticsearch;MySQL;Oracle;Redis | Elasticsearch;MySQL;Oracle;Redis | Docker;Linux;Raspberry Pi;Windows | Docker;Linux;Raspberry Pi;Windows | Angular/Angular.js;Ruby on Rails | Angular/Angular.js;Ruby on Rails | Node.js | Node.js | Sublime Text;Visual Studio;Visual Studio Code | Windows | Outside of work, for personal projects | Not at all | NaN | Yes | Yes | Yes | Instagram | Online | Username | I don't remember | Daily or almost daily | Find answers to specific questions | 3-5 times per week | Stack Overflow was much faster | 0-10 minutes | Yes | A few times per week | Yes | No, and I don't know what those are | Yes, somewhat | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 47.0 | Man | No | Straight / Heterosexual | White or of European descent | Yes | Appropriate in length | Easy |
|---|
| 23 | I am a developer by profession | Yes | Less than once per year | The quality of OSS and closed source software ... | Employed full-time | United States | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Information systems, information technology, o... | Taken an online course in programming or softw... | 10,000 or more employees | Developer, full-stack | 3 | 19 | 1 | Slightly satisfied | Slightly satisfied | Very confident | No | Not sure | I’m not actively looking, but I am open to new... | Less than a year ago | Write any code;Write code by hand (e.g., on a ... | No | Opportunities for professional development;How... | I was preparing for a job search | USD | United States dollar | 69000.0 | Yearly | 69000.0 | 40.0 | There is a schedule and/or spec (made by me or... | Distracting work environment;Meetings;Non-work... | A few days each month | Office | Average | Yes, because I see value in code review | 8.0 | Yes, it's part of our process | Developers and management have nearly equal in... | I have little or no influence | Bash/Shell/PowerShell;HTML/CSS;JavaScript;Pyth... | Bash/Shell/PowerShell;Go;HTML/CSS;Java;JavaScr... | Oracle;SQLite | Couchbase;DynamoDB;Elasticsearch;Firebase;Oracle | Docker;Google Cloud Platform | Docker;iOS;Slack | React.js;Ruby on Rails | Express;React.js;Ruby on Rails;Vue.js | NaN | React Native;TensorFlow | Visual Studio Code | MacOS | Development;Testing;Production | NaN | Useful for immutable record keeping outside of... | Yes | SIGH | Yes | Reddit | In real life (in person) | Username | 2014 | Multiple times per day | Find answers to specific questions;Learn how t... | 6-10 times per week | They were about the same | NaN | Yes | I have never participated in Q&A on Stack Over... | Yes | No, I've heard of them, but I am not part of a... | No, not really | Just as welcome now as I felt last year | Tech articles written by other developers;Tech... | 22.0 | Man | No | Straight / Heterosexual | Black or of African descent | No | Appropriate in length | Easy |
|---|
| 26 | I am a developer by profession | Yes | Less than once per year | The quality of OSS and closed source software ... | Employed full-time | United States | No | Some college/university study without earning ... | Computer science, computer engineering, or sof... | Taught yourself a new language, framework, or ... | 10,000 or more employees | Designer;Developer, back-end;Developer, deskto... | 12 | 8 | 8 | Very satisfied | Very satisfied | NaN | NaN | NaN | I’m not actively looking, but I am open to new... | Less than a year ago | Interview with people in peer roles;Interview ... | No | Remote work options;Diversity of the company o... | I was preparing for a job search | USD | United States dollar | 114000.0 | Yearly | 114000.0 | 40.0 | There is a schedule and/or spec (made by me or... | Being tasked with non-development work;Meeting... | Less than half the time, but at least one day ... | Home | Far above average | Yes, because I see value in code review | 2.0 | Yes, it's not part of our process but the deve... | Developers typically have the most influence o... | I have a great deal of influence | Bash/Shell/PowerShell;C++;C#;HTML/CSS;JavaScri... | C#;HTML/CSS;JavaScript;Objective-C;Ruby;SQL;Sw... | Microsoft SQL Server;MySQL;Redis;SQLite | Microsoft SQL Server;MySQL;Redis;SQLite | AWS;Docker;Linux;MacOS;Microsoft Azure;Windows... | Android;Docker;iOS;Linux;MacOS;Microsoft Azure... | Angular/Angular.js;ASP.NET;Drupal;Express;jQue... | Angular/Angular.js;ASP.NET | .NET;.NET Core;Node.js;Xamarin | .NET;.NET Core;Node.js | Notepad++;Sublime Text;Vim;Visual Studio;Xcode | MacOS | Development;Testing | Not at all | A passing fad | Yes | SIGH | Yes | I don't use social media | In real life (in person) | Username | 2008 | Daily or almost daily | Find answers to specific questions;Learn how t... | 3-5 times per week | Stack Overflow was much faster | 11-30 minutes | Yes | Less than once per month or monthly | Yes | No, I've heard of them, but I am not part of a... | Neutral | Just as welcome now as I felt last year | NaN | 34.0 | Man | No | Gay or Lesbian | NaN | No | Appropriate in length | Easy |
|---|
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
|---|
| 78292 | NaN | No | Once a month or more often | OSS is, on average, of HIGHER quality than pro... | Independent contractor, freelancer, or self-em... | United States | No | Other doctoral degree (Ph.D, Ed.D., etc.) | A health science (ex. nursing, pharmacy, radio... | Completed an industry certification program (e... | Just me - I am a freelancer, sole proprietor, ... | Academic researcher | 42 | 14 | 31 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Bash/Shell/PowerShell;C;Python | Bash/Shell/PowerShell;C;Python | SQLite | SQLite | Linux;Raspberry Pi;Other(s): | Linux;Raspberry Pi;Other(s): | NaN | NaN | Chef | NaN | Emacs;IPython / Jupyter | Linux-based | I do not use containers | NaN | Useful for immutable record keeping outside of... | No | Yes | Yes | I don't use social media | In real life (in person) | NaN | 2013 | A few times per week | Find answers to specific questions | Less than once per week | The other resource was slightly faster | 11-30 minutes | Not sure / can't remember | NaN | No, I didn't know that Stack Overflow had a jo... | No, and I don't know what those are | No, not really | Somewhat less welcome now than last year | NaN | 60.0 | Man | No | Straight / Heterosexual | White or of European descent | Yes | Too long | Neither easy nor difficult |
|---|
| 82717 | NaN | No | Less than once per year | The quality of OSS and closed source software ... | Not employed, but looking for work | United States | No | Secondary school (e.g. American high school, G... | NaN | NaN | NaN | NaN | Less than 1 year | NaN | Less than 1 year | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Android;Windows | Android;Microsoft Azure;Windows | NaN | NaN | NaN | NaN | NaN | MacOS | Testing | NaN | NaN | No | SIGH | Yes | Facebook | In real life (in person) | Username | 2018 | Less than once per month or monthly | Find answers to specific questions | Less than once per week | NaN | 60+ minutes | No | NaN | No, I knew that Stack Overflow had a job board... | No, I've heard of them, but I am not part of a... | Not sure | NaN | Industry news about technologies you're intere... | 44.0 | Man | No | Straight / Heterosexual | White or of European descent | Yes | Appropriate in length | Neither easy nor difficult |
|---|
| 83397 | NaN | Yes | Less than once per year | NaN | Not employed, but looking for work | United States | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Computer science, computer engineering, or sof... | Taken an online course in programming or softw... | NaN | NaN | 12 | 9 | Less than 1 year | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | HTML/CSS;JavaScript;Python;SQL | C;C++;C#;Go;Java;JavaScript;Python;R;Ruby;SQL;... | NaN | NaN | Android;Arduino;Slack | Android;Arduino;Docker;iOS;Raspberry Pi;Slack | Flask | Django;Drupal;Flask;jQuery;React.js | NaN | Chef;Torch/PyTorch | Eclipse;IPython / Jupyter;Sublime Text | MacOS | I do not use containers | NaN | NaN | NaN | SIGH | Yes | NaN | NaN | Handle | I don't remember | A few times per week | Find answers to specific questions;Learn how t... | 3-5 times per week | They were about the same | NaN | Not sure / can't remember | NaN | Yes | No, and I don't know what those are | No, not at all | Just as welcome now as I felt last year | NaN | 27.0 | Woman | No | Bisexual | White or of European descent | No | Appropriate in length | Easy |
|---|
| 85642 | NaN | No | Less than once per year | OSS is, on average, of LOWER quality than prop... | Independent contractor, freelancer, or self-em... | United States | No | Associate degree | Information systems, information technology, o... | Taken an online course in programming or softw... | Just me - I am a freelancer, sole proprietor, ... | Designer;Marketing or sales professional | 20 | 7 | Less than 1 year | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Go;HTML/CSS | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Visual Studio Code | Windows | I do not use containers | NaN | Useful for immutable record keeping outside of... | No | SIGH | Yes | NaN | In real life (in person) | Handle | 2008 | Less than once per month or monthly | Find answers to specific questions | Less than once per week | Stack Overflow was slightly faster | 60+ minutes | Yes | I have never participated in Q&A on Stack Over... | No, I knew that Stack Overflow had a job board... | No, and I don't know what those are | No, not at all | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 34.0 | Non-binary, genderqueer, or gender non-conforming | NaN | Bisexual;Gay or Lesbian | White or of European descent | No | Appropriate in length | Easy |
|---|
| 88282 | NaN | Yes | Once a month or more often | The quality of OSS and closed source software ... | Not employed, but looking for work | United States | No | Some college/university study without earning ... | Computer science, computer engineering, or sof... | Taught yourself a new language, framework, or ... | NaN | Developer, back-end;Developer, desktop or ente... | 38 | 10 | 38 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Bash/Shell/PowerShell;Go;HTML/CSS;JavaScript;W... | Bash/Shell/PowerShell;C;Go;HTML/CSS;JavaScript... | NaN | NaN | Linux | Linux;Raspberry Pi | React.js | Vue.js | Node.js | Ansible | Vim | Linux-based | I do not use containers | NaN | An irresponsible use of resources | No | NaN | Yes | I don't use social media | In real life (in person) | Username | I don't remember | A few times per month or weekly | Find answers to specific questions | 1-2 times per week | They were about the same | NaN | Yes | I have never participated in Q&A on Stack Over... | Yes | No, and I don't know what those are | No, not really | Just as welcome now as I felt last year | NaN | NaN | Man | No | Straight / Heterosexual | NaN | No | Too short | Neither easy nor difficult |
|---|
20949 rows × 84 columns
filt = df['Country'] == 'India'
df.loc[filt,'SocialMedia'].value_counts()
WhatsApp 2990
YouTube 1820
LinkedIn 955
Facebook 841
Instagram 822
Twitter 542
Reddit 473
I don't use social media 250
Snapchat 23
Hello 5
WeChat 微信 5
VK ВКонта́кте 4
Youku Tudou 优酷 2
Weibo 新浪微博 1
Name: SocialMedia, dtype: int64
country_grp['SocialMedia'].value_counts()
Country SocialMedia
Afghanistan Facebook 15
YouTube 9
I don't use social media 6
WhatsApp 4
Instagram 1
..
Zimbabwe Facebook 3
YouTube 3
Instagram 2
LinkedIn 2
Reddit 1
Name: SocialMedia, Length: 1220, dtype: int64
country_grp['SocialMedia'].value_counts().loc['India']
SocialMedia
WhatsApp 2990
YouTube 1820
LinkedIn 955
Facebook 841
Instagram 822
Twitter 542
Reddit 473
I don't use social media 250
Snapchat 23
Hello 5
WeChat 微信 5
VK ВКонта́кте 4
Youku Tudou 优酷 2
Weibo 新浪微博 1
Name: SocialMedia, dtype: int64
country_grp['ConvertedComp'].median()
Country
Afghanistan 6222.0
Albania 10818.0
Algeria 7878.0
Andorra 160931.0
Angola 7764.0
...
Venezuela, Bolivarian Republic of... 6384.0
Viet Nam 11892.0
Yemen 11940.0
Zambia 5040.0
Zimbabwe 19200.0
Name: ConvertedComp, Length: 179, dtype: float64
country_grp['ConvertedComp'].agg(['median','mean'])
| median | mean |
|---|
| Country | | |
|---|
| Afghanistan | 6222.0 | 101953.333333 |
|---|
| Albania | 10818.0 | 21833.700000 |
|---|
| Algeria | 7878.0 | 34924.047619 |
|---|
| Andorra | 160931.0 | 160931.000000 |
|---|
| Angola | 7764.0 | 7764.000000 |
|---|
| ... | ... | ... |
|---|
| Venezuela, Bolivarian Republic of... | 6384.0 | 14581.627907 |
|---|
| Viet Nam | 11892.0 | 17233.436782 |
|---|
| Yemen | 11940.0 | 16909.166667 |
|---|
| Zambia | 5040.0 | 10075.375000 |
|---|
| Zimbabwe | 19200.0 | 34046.666667 |
|---|
179 rows × 2 columns
filt = df['Country'] == 'India'
df.loc[filt,'LanguageWorkedWith'].str.contains('Python')
Respondent
8 True
10 True
15 False
50 True
65 False
...
77339 False
79795 True
83862 False
84299 True
86012 False
Name: LanguageWorkedWith, Length: 9061, dtype: object
df.loc[filt,'LanguageWorkedWith'].str.contains('Python').sum()#统计多少使用python
3105
country_grp['LanguageWorkedWith'].apply(lambda x:x.str.contains('Python').sum())#这里的x是一个国家数据,例如上面的India,参考上面的例子理解
Country
Afghanistan 8
Albania 23
Algeria 40
Andorra 0
Angola 2
..
Venezuela, Bolivarian Republic of... 28
Viet Nam 78
Yemen 3
Zambia 4
Zimbabwe 14
Name: LanguageWorkedWith, Length: 179, dtype: int64
country_respondents = df['Country'].value_counts()
country_respondents
United States 20949
India 9061
Germany 5866
United Kingdom 5737
Canada 3395
...
Tonga 1
Timor-Leste 1
North Korea 1
Brunei Darussalam 1
Chad 1
Name: Country, Length: 179, dtype: int64
country_use_python = country_grp['LanguageWorkedWith'].apply(lambda x:x.str.contains('Python').sum())
country_use_python
Country
Afghanistan 8
Albania 23
Algeria 40
Andorra 0
Angola 2
..
Venezuela, Bolivarian Republic of... 28
Viet Nam 78
Yemen 3
Zambia 4
Zimbabwe 14
Name: LanguageWorkedWith, Length: 179, dtype: int64
concat
python_df = pd.concat([country_respondents,country_use_python],axis='columns')
python_df
| Country | LanguageWorkedWith |
|---|
| United States | 20949 | 10083 |
|---|
| India | 9061 | 3105 |
|---|
| Germany | 5866 | 2451 |
|---|
| United Kingdom | 5737 | 2384 |
|---|
| Canada | 3395 | 1558 |
|---|
| ... | ... | ... |
|---|
| Tonga | 1 | 0 |
|---|
| Timor-Leste | 1 | 1 |
|---|
| North Korea | 1 | 0 |
|---|
| Brunei Darussalam | 1 | 0 |
|---|
| Chad | 1 | 0 |
|---|
179 rows × 2 columns
python_df.rename(columns={'Country':'NumRespoondents','LanguageWorkedWith':'NumKnowPython'},inplace=True)
python_df
| NumRespoondents | NumKnowPython |
|---|
| United States | 20949 | 10083 |
|---|
| India | 9061 | 3105 |
|---|
| Germany | 5866 | 2451 |
|---|
| United Kingdom | 5737 | 2384 |
|---|
| Canada | 3395 | 1558 |
|---|
| ... | ... | ... |
|---|
| Tonga | 1 | 0 |
|---|
| Timor-Leste | 1 | 1 |
|---|
| North Korea | 1 | 0 |
|---|
| Brunei Darussalam | 1 | 0 |
|---|
| Chad | 1 | 0 |
|---|
179 rows × 2 columns
python_df['PctKnowPython'] = (python_df['NumKnowPython']/python_df['NumRespoondents']) * 100
python_df
| NumRespoondents | NumKnowPython | PctKnowPython |
|---|
| United States | 20949 | 10083 | 48.131176 |
|---|
| India | 9061 | 3105 | 34.267741 |
|---|
| Germany | 5866 | 2451 | 41.783157 |
|---|
| United Kingdom | 5737 | 2384 | 41.554820 |
|---|
| Canada | 3395 | 1558 | 45.891016 |
|---|
| ... | ... | ... | ... |
|---|
| Tonga | 1 | 0 | 0.000000 |
|---|
| Timor-Leste | 1 | 1 | 100.000000 |
|---|
| North Korea | 1 | 0 | 0.000000 |
|---|
| Brunei Darussalam | 1 | 0 | 0.000000 |
|---|
| Chad | 1 | 0 | 0.000000 |
|---|
179 rows × 3 columns
python_df.sort_values(by='PctKnowPython',ascending=False,inplace=True)
python_df
| NumRespoondents | NumKnowPython | PctKnowPython |
|---|
| Sao Tome and Principe | 1 | 1 | 100.000000 |
|---|
| Timor-Leste | 1 | 1 | 100.000000 |
|---|
| Dominica | 1 | 1 | 100.000000 |
|---|
| Niger | 1 | 1 | 100.000000 |
|---|
| Turkmenistan | 7 | 6 | 85.714286 |
|---|
| ... | ... | ... | ... |
|---|
| Cape Verde | 3 | 0 | 0.000000 |
|---|
| Lao People's Democratic Republic | 3 | 0 | 0.000000 |
|---|
| Malawi | 2 | 0 | 0.000000 |
|---|
| Liberia | 2 | 0 | 0.000000 |
|---|
| Chad | 1 | 0 | 0.000000 |
|---|
179 rows × 3 columns
注意点
count(),median(),sum()是聚合函数所以要有括号
count()统计时不考虑空值
sum()函数只统计True的个数
数据清洗
import pandas as pd
import numpy as np
people = {
"first": ["Corey", 'Jane', 'John','Chris',np.nan,None,'NA'],
"last": ["Schafer", 'Doe', 'Doe','Schafer',np.nan,np.nan,'Missing'],
"email": ["CoreyMSchafer@gmail", 'JaneDoe@email', 'JohnDoe@email',None,np.nan,'Anony@email','NA'],
'age':['33','55','63','36',None,None,'Missing']
}
df = pd.DataFrame(people)
df.replace('NA',np.nan,inplace=True)
df.replace('Missing',np.nan,inplace=True)
df
C:\Users\24539\AppData\Roaming\Python\Python37\site-packages\pandas\compat\_optional.py:138: UserWarning: Pandas requires version '2.7.0' or newer of 'numexpr' (version '2.6.8' currently installed).
warnings.warn(msg, UserWarning)
| first | last | email | age |
|---|
| 0 | Corey | Schafer | CoreyMSchafer@gmail | 33 |
|---|
| 1 | Jane | Doe | JaneDoe@email | 55 |
|---|
| 2 | John | Doe | JohnDoe@email | 63 |
|---|
| 3 | Chris | Schafer | None | 36 |
|---|
| 4 | NaN | NaN | NaN | None |
|---|
| 5 | None | NaN | Anony@email | None |
|---|
| 6 | NaN | NaN | NaN | NaN |
|---|
df.dropna()#默认删除行
| first | last | email | age |
|---|
| 0 | Corey | Schafer | CoreyMSchafer@gmail | 33 |
|---|
| 1 | Jane | Doe | JaneDoe@email | 55 |
|---|
| 2 | John | Doe | JohnDoe@email | 63 |
|---|
df.dropna(axis='index',how='any')#dropna()中默认axis='index',how='any',any:只要存在一个缺失值就删除该行
| first | last | email | age |
|---|
| 0 | Corey | Schafer | CoreyMSchafer@gmail | 33 |
|---|
| 1 | Jane | Doe | JaneDoe@email | 55 |
|---|
| 2 | John | Doe | JohnDoe@email | 63 |
|---|
df.dropna(axis='index',how='all')#一行全部缺失才删除
| first | last | email | age |
|---|
| 0 | Corey | Schafer | CoreyMSchafer@gmail | 33 |
|---|
| 1 | Jane | Doe | JaneDoe@email | 55 |
|---|
| 2 | John | Doe | JohnDoe@email | 63 |
|---|
| 3 | Chris | Schafer | None | 36 |
|---|
| 5 | None | NaN | Anony@email | None |
|---|
df.dropna(axis='columns',how='all')
| first | last | email | age |
|---|
| 0 | Corey | Schafer | CoreyMSchafer@gmail | 33 |
|---|
| 1 | Jane | Doe | JaneDoe@email | 55 |
|---|
| 2 | John | Doe | JohnDoe@email | 63 |
|---|
| 3 | Chris | Schafer | None | 36 |
|---|
| 4 | NaN | NaN | NaN | None |
|---|
| 5 | None | NaN | Anony@email | None |
|---|
| 6 | NaN | NaN | NaN | NaN |
|---|
df.dropna(axis='columns',how='any')
df
| first | last | email | age |
|---|
| 0 | Corey | Schafer | CoreyMSchafer@gmail | 33 |
|---|
| 1 | Jane | Doe | JaneDoe@email | 55 |
|---|
| 2 | John | Doe | JohnDoe@email | 63 |
|---|
| 3 | Chris | Schafer | None | 36 |
|---|
| 4 | NaN | NaN | NaN | None |
|---|
| 5 | None | NaN | Anony@email | None |
|---|
| 6 | NaN | NaN | NaN | NaN |
|---|
df.dropna(axis='index',how='any',subset=['last','email'])#last和email中至少存在一个为缺失值,则删除该行
| first | last | email | age |
|---|
| 0 | Corey | Schafer | CoreyMSchafer@gmail | 33 |
|---|
| 1 | Jane | Doe | JaneDoe@email | 55 |
|---|
| 2 | John | Doe | JohnDoe@email | 63 |
|---|
df.dropna(axis='index',how='all',subset=['last','email'])#last和email同时存在缺失值,则删除该行
| first | last | email | age |
|---|
| 0 | Corey | Schafer | CoreyMSchafer@gmail | 33 |
|---|
| 1 | Jane | Doe | JaneDoe@email | 55 |
|---|
| 2 | John | Doe | JohnDoe@email | 63 |
|---|
| 3 | Chris | Schafer | None | 36 |
|---|
| 5 | None | NaN | Anony@email | None |
|---|
df.isna()
| first | last | email | age |
|---|
| 0 | False | False | False | False |
|---|
| 1 | False | False | False | False |
|---|
| 2 | False | False | False | False |
|---|
| 3 | False | False | True | False |
|---|
| 4 | True | True | True | True |
|---|
| 5 | True | True | False | True |
|---|
| 6 | True | True | True | True |
|---|
df.fillna('MISSING')
| first | last | email | age |
|---|
| 0 | Corey | Schafer | CoreyMSchafer@gmail | 33 |
|---|
| 1 | Jane | Doe | JaneDoe@email | 55 |
|---|
| 2 | John | Doe | JohnDoe@email | 63 |
|---|
| 3 | Chris | Schafer | MISSING | 36 |
|---|
| 4 | MISSING | MISSING | MISSING | MISSING |
|---|
| 5 | MISSING | MISSING | Anony@email | MISSING |
|---|
| 6 | MISSING | MISSING | MISSING | MISSING |
|---|
df.fillna(0)
| first | last | email | age |
|---|
| 0 | Corey | Schafer | CoreyMSchafer@gmail | 33 |
|---|
| 1 | Jane | Doe | JaneDoe@email | 55 |
|---|
| 2 | John | Doe | JohnDoe@email | 63 |
|---|
| 3 | Chris | Schafer | 0 | 36 |
|---|
| 4 | 0 | 0 | 0 | 0 |
|---|
| 5 | 0 | 0 | Anony@email | 0 |
|---|
| 6 | 0 | 0 | 0 | 0 |
|---|
df.dtypes
first object
last object
email object
age object
dtype: object
df['age'] = df['age'].astype(int)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-14-9b0df8191b9d> in <module>
----> 1 df['age'] = df['age'].astype(int)
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors)
5813 else:
5814 # else, only a single dtype is given
-> 5815 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
5816 return self._constructor(new_data).__finalize__(self, method="astype")
5817
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\internals\managers.py in astype(self, dtype, copy, errors)
416
417 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T:
--> 418 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
419
420 def convert(
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\internals\managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
325 applied = b.apply(f, **kwargs)
326 else:
--> 327 applied = getattr(b, f)(**kwargs)
328 except (TypeError, NotImplementedError):
329 if not ignore_failures:
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors)
589 values = self.values
590
--> 591 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
592
593 new_values = maybe_coerce_values(new_values)
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\dtypes\cast.py in astype_array_safe(values, dtype, copy, errors)
1307
1308 try:
-> 1309 new_values = astype_array(values, dtype, copy=copy)
1310 except (ValueError, TypeError):
1311 # e.g. astype_nansafe can fail on object-dtype of strings
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\dtypes\cast.py in astype_array(values, dtype, copy)
1255
1256 else:
-> 1257 values = astype_nansafe(values, dtype, copy=copy)
1258
1259 # in pandas we don't store numpy str dtypes, so convert to object
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna)
1172 # work around NumPy brokenness, #1987
1173 if np.issubdtype(dtype.type, np.integer):
-> 1174 return lib.astype_intsafe(arr, dtype)
1175
1176 # if we have a datetime/timedelta array of objects
~\AppData\Roaming\Python\Python37\site-packages\pandas\_libs\lib.pyx in pandas._libs.lib.astype_intsafe()
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
type(np.nan)
float
df['age'] = df['age'].astype(float)
df.dtypes
first object
last object
email object
age float64
dtype: object
na_vals = ['NA','Missing']
df = pd.read_csv('data/survey_results_public.csv',index_col='Respondent',na_values=na_vals)#na_values
schema_df = pd.read_csv('data/survey_results_schema.csv',index_col='Column')
pd.set_option('display.max_columns', 85)
pd.set_option('display.max_rows', 85)
df.head(3)
| MainBranch | Hobbyist | OpenSourcer | OpenSource | Employment | Country | Student | EdLevel | UndergradMajor | EduOther | OrgSize | DevType | YearsCode | Age1stCode | YearsCodePro | CareerSat | JobSat | MgrIdiot | MgrMoney | MgrWant | JobSeek | LastHireDate | LastInt | FizzBuzz | JobFactors | ResumeUpdate | CurrencySymbol | CurrencyDesc | CompTotal | CompFreq | ConvertedComp | WorkWeekHrs | WorkPlan | WorkChallenge | WorkRemote | WorkLoc | ImpSyn | CodeRev | CodeRevHrs | UnitTests | PurchaseHow | PurchaseWhat | LanguageWorkedWith | LanguageDesireNextYear | DatabaseWorkedWith | DatabaseDesireNextYear | PlatformWorkedWith | PlatformDesireNextYear | WebFrameWorkedWith | WebFrameDesireNextYear | MiscTechWorkedWith | MiscTechDesireNextYear | DevEnviron | OpSys | Containers | BlockchainOrg | BlockchainIs | BetterLife | ITperson | OffOn | SocialMedia | Extraversion | ScreenName | SOVisit1st | SOVisitFreq | SOVisitTo | SOFindAnswer | SOTimeSaved | SOHowMuchTime | SOAccount | SOPartFreq | SOJobs | EntTeams | SOComm | WelcomeChange | SONewContent | Age | Gender | Trans | Sexuality | Ethnicity | Dependents | SurveyLength | SurveyEase |
|---|
| Respondent | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|---|
| 1 | I am a student who is learning to code | Yes | Never | The quality of OSS and closed source software ... | Not employed, and not looking for work | United Kingdom | No | Primary/elementary school | NaN | Taught yourself a new language, framework, or ... | NaN | NaN | 4 | 10 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | HTML/CSS;Java;JavaScript;Python | C;C++;C#;Go;HTML/CSS;Java;JavaScript;Python;SQL | SQLite | MySQL | MacOS;Windows | Android;Arduino;Windows | Django;Flask | Flask;jQuery | Node.js | Node.js | IntelliJ;Notepad++;PyCharm | Windows | I do not use containers | NaN | NaN | Yes | Fortunately, someone else has that title | Yes | Twitter | Online | Username | 2017 | A few times per month or weekly | Find answers to specific questions;Learn how t... | 3-5 times per week | Stack Overflow was much faster | 31-60 minutes | No | NaN | No, I didn't know that Stack Overflow had a jo... | No, and I don't know what those are | Neutral | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 14.0 | Man | No | Straight / Heterosexual | NaN | No | Appropriate in length | Neither easy nor difficult |
|---|
| 2 | I am a student who is learning to code | No | Less than once per year | The quality of OSS and closed source software ... | Not employed, but looking for work | Bosnia and Herzegovina | Yes, full-time | Secondary school (e.g. American high school, G... | NaN | Taken an online course in programming or softw... | NaN | Developer, desktop or enterprise applications;... | NaN | 17 | NaN | NaN | NaN | NaN | NaN | NaN | I am actively looking for a job | I've never had a job | NaN | NaN | Financial performance or funding status of the... | Something else changed (education, award, medi... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | C++;HTML/CSS;Python | C++;HTML/CSS;JavaScript;SQL | NaN | MySQL | Windows | Windows | Django | Django | NaN | NaN | Atom;PyCharm | Windows | I do not use containers | NaN | Useful across many domains and could change ma... | Yes | Yes | Yes | Instagram | Online | Username | 2017 | Daily or almost daily | Find answers to specific questions;Learn how t... | 3-5 times per week | Stack Overflow was much faster | 11-30 minutes | Yes | A few times per month or weekly | No, I knew that Stack Overflow had a job board... | No, and I don't know what those are | Yes, somewhat | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 19.0 | Man | No | Straight / Heterosexual | NaN | No | Appropriate in length | Neither easy nor difficult |
|---|
| 3 | I am not primarily a developer, but I write co... | Yes | Never | The quality of OSS and closed source software ... | Employed full-time | Thailand | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Web development or web design | Taught yourself a new language, framework, or ... | 100 to 499 employees | Designer;Developer, back-end;Developer, front-... | 3 | 22 | 1 | Slightly satisfied | Slightly satisfied | Not at all confident | Not sure | Not sure | I’m not actively looking, but I am open to new... | 1-2 years ago | Interview with people in peer roles | No | Languages, frameworks, and other technologies ... | I was preparing for a job search | THB | Thai baht | 23000.0 | Monthly | 8820.0 | 40.0 | There's no schedule or spec; I work on what se... | Distracting work environment;Inadequate access... | Less than once per month / Never | Home | Average | No | NaN | No, but I think we should | Not sure | I have little or no influence | HTML/CSS | Elixir;HTML/CSS | PostgreSQL | PostgreSQL | NaN | NaN | NaN | Other(s): | NaN | NaN | Vim;Visual Studio Code | Linux-based | I do not use containers | NaN | NaN | Yes | Yes | Yes | Reddit | In real life (in person) | Username | 2011 | A few times per week | Find answers to specific questions;Learn how t... | 6-10 times per week | They were about the same | NaN | Yes | Less than once per month or monthly | Yes | No, I've heard of them, but I am not part of a... | Neutral | Just as welcome now as I felt last year | Tech meetups or events in your area;Courses on... | 28.0 | Man | No | Straight / Heterosexual | NaN | Yes | Appropriate in length | Neither easy nor difficult |
|---|
df['YearsCode'].head(10)
Respondent
1 4
2 NaN
3 3
4 3
5 16
6 13
7 6
8 8
9 12
10 12
Name: YearsCode, dtype: object
df['YearsCode'] = df['YearsCode'].astype(float)#
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-21-245fa41f666c> in <module>
----> 1 df['YearsCode'] = df['YearsCode'].astype(float)#
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors)
5813 else:
5814 # else, only a single dtype is given
-> 5815 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
5816 return self._constructor(new_data).__finalize__(self, method="astype")
5817
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\internals\managers.py in astype(self, dtype, copy, errors)
416
417 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T:
--> 418 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
419
420 def convert(
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\internals\managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
325 applied = b.apply(f, **kwargs)
326 else:
--> 327 applied = getattr(b, f)(**kwargs)
328 except (TypeError, NotImplementedError):
329 if not ignore_failures:
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors)
589 values = self.values
590
--> 591 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
592
593 new_values = maybe_coerce_values(new_values)
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\dtypes\cast.py in astype_array_safe(values, dtype, copy, errors)
1307
1308 try:
-> 1309 new_values = astype_array(values, dtype, copy=copy)
1310 except (ValueError, TypeError):
1311 # e.g. astype_nansafe can fail on object-dtype of strings
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\dtypes\cast.py in astype_array(values, dtype, copy)
1255
1256 else:
-> 1257 values = astype_nansafe(values, dtype, copy=copy)
1258
1259 # in pandas we don't store numpy str dtypes, so convert to object
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna)
1199 if copy or is_object_dtype(arr.dtype) or is_object_dtype(dtype):
1200 # Explicit copy, or required since NumPy can't view from / to object.
-> 1201 return arr.astype(dtype, copy=True)
1202
1203 return arr.astype(dtype, copy=copy)
ValueError: could not convert string to float: 'Less than 1 year'
float('3')
float('A')
转换错误原因与float(‘A’)类似
df['YearsCode'].unique()
array(['4', nan, '3', '16', '13', '6', '8', '12', '2', '5', '17', '10',
'14', '35', '7', 'Less than 1 year', '30', '9', '26', '40', '19',
'15', '20', '28', '25', '1', '22', '11', '33', '50', '41', '18',
'34', '24', '23', '42', '27', '21', '36', '32', '39', '38', '31',
'37', 'More than 50 years', '29', '44', '45', '48', '46', '43',
'47', '49'], dtype=object)
df['YearsCode'].replace({'Less than 1 year':0,'More than 50 years':51},inplace=True)
df['YearsCode'].unique()
array(['4', nan, '3', '16', '13', '6', '8', '12', '2', '5', '17', '10',
'14', '35', '7', 0, '30', '9', '26', '40', '19', '15', '20', '28',
'25', '1', '22', '11', '33', '50', '41', '18', '34', '24', '23',
'42', '27', '21', '36', '32', '39', '38', '31', '37', 51, '29',
'44', '45', '48', '46', '43', '47', '49'], dtype=object)
df['YearsCode'] = df['YearsCode'].astype(float)
df['YearsCode'].mean()
11.662114216834588
df['YearsCode'].median()
9.0
小节
df.dropna()中how参数
any:只要存在一个缺失值就删除该行
all:所有值缺失时才删除该行
drop、fillna、replace 都有inplace参数,因为删除之后,恢复很麻烦
添加数据的函数就没有该参数,例如concat
时间序列数据
import pandas as pd
df = pd.read_csv('E:/Pandas/data/ETH_1h.csv')
df.head()
| Date | Symbol | Open | High | Low | Close | Volume |
|---|
| 0 | 2020-03-13 08-PM | ETHUSD | 129.94 | 131.82 | 126.87 | 128.71 | 1940673.93 |
|---|
| 1 | 2020-03-13 07-PM | ETHUSD | 119.51 | 132.02 | 117.10 | 129.94 | 7579741.09 |
|---|
| 2 | 2020-03-13 06-PM | ETHUSD | 124.47 | 124.85 | 115.50 | 119.51 | 4898735.81 |
|---|
| 3 | 2020-03-13 05-PM | ETHUSD | 124.08 | 127.42 | 121.63 | 124.47 | 2753450.92 |
|---|
| 4 | 2020-03-13 04-PM | ETHUSD | 124.85 | 129.51 | 120.17 | 124.08 | 4461424.71 |
|---|
df['Date'] = pd.to_datetime(df['Date'],format='%Y-%m-%d %I-%p')
df['Date']
0 2020-03-13 20:00:00
1 2020-03-13 19:00:00
2 2020-03-13 18:00:00
3 2020-03-13 17:00:00
4 2020-03-13 16:00:00
...
23669 2017-07-01 15:00:00
23670 2017-07-01 14:00:00
23671 2017-07-01 13:00:00
23672 2017-07-01 12:00:00
23673 2017-07-01 11:00:00
Name: Date, Length: 23674, dtype: datetime64[ns]
df.loc[0,'Date'].day_name()
'Friday'
d_parser = lambda x:pd.datetime.strptime(x,'%Y-%m-%d %I-%p')
df = pd.read_csv('E:/Pandas/data/ETH_1h.csv',parse_dates=['Date'],date_parser=d_parser)
'''
parse_dates指定解析列
date_parser解析函数
'''
D:\Anaconda\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: The pandas.datetime class is deprecated and will be removed from pandas in a future version. Import from datetime module instead.
"""Entry point for launching an IPython kernel.
'\nparse_dates指定解析列\ndate_parser解析函数\n'
df.head()
| Date | Symbol | Open | High | Low | Close | Volume |
|---|
| 0 | 2020-03-13 20:00:00 | ETHUSD | 129.94 | 131.82 | 126.87 | 128.71 | 1940673.93 |
|---|
| 1 | 2020-03-13 19:00:00 | ETHUSD | 119.51 | 132.02 | 117.10 | 129.94 | 7579741.09 |
|---|
| 2 | 2020-03-13 18:00:00 | ETHUSD | 124.47 | 124.85 | 115.50 | 119.51 | 4898735.81 |
|---|
| 3 | 2020-03-13 17:00:00 | ETHUSD | 124.08 | 127.42 | 121.63 | 124.47 | 2753450.92 |
|---|
| 4 | 2020-03-13 16:00:00 | ETHUSD | 124.85 | 129.51 | 120.17 | 124.08 | 4461424.71 |
|---|
df['Date'].dt.day_name()#Series中有dt类,还有例如str类
0 Friday
1 Friday
2 Friday
3 Friday
4 Friday
...
23669 Saturday
23670 Saturday
23671 Saturday
23672 Saturday
23673 Saturday
Name: Date, Length: 23674, dtype: object
df['DayOfWeek'] = df['Date'].dt.day_name()#加入星期列
df
| Date | Symbol | Open | High | Low | Close | Volume | DayOfWeek |
|---|
| 0 | 2020-03-13 20:00:00 | ETHUSD | 129.94 | 131.82 | 126.87 | 128.71 | 1940673.93 | Friday |
|---|
| 1 | 2020-03-13 19:00:00 | ETHUSD | 119.51 | 132.02 | 117.10 | 129.94 | 7579741.09 | Friday |
|---|
| 2 | 2020-03-13 18:00:00 | ETHUSD | 124.47 | 124.85 | 115.50 | 119.51 | 4898735.81 | Friday |
|---|
| 3 | 2020-03-13 17:00:00 | ETHUSD | 124.08 | 127.42 | 121.63 | 124.47 | 2753450.92 | Friday |
|---|
| 4 | 2020-03-13 16:00:00 | ETHUSD | 124.85 | 129.51 | 120.17 | 124.08 | 4461424.71 | Friday |
|---|
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
|---|
| 23669 | 2017-07-01 15:00:00 | ETHUSD | 265.74 | 272.74 | 265.00 | 272.57 | 1500282.55 | Saturday |
|---|
| 23670 | 2017-07-01 14:00:00 | ETHUSD | 268.79 | 269.90 | 265.00 | 265.74 | 1702536.85 | Saturday |
|---|
| 23671 | 2017-07-01 13:00:00 | ETHUSD | 274.83 | 274.93 | 265.00 | 268.79 | 3010787.99 | Saturday |
|---|
| 23672 | 2017-07-01 12:00:00 | ETHUSD | 275.01 | 275.01 | 271.00 | 274.83 | 824362.87 | Saturday |
|---|
| 23673 | 2017-07-01 11:00:00 | ETHUSD | 279.98 | 279.99 | 272.10 | 275.01 | 679358.87 | Saturday |
|---|
23674 rows × 8 columns
df['Date'].min()
Timestamp('2017-07-01 11:00:00')
df['Date'].max()
Timestamp('2020-03-13 20:00:00')
df['Date'].max() - df['Date'].min()
Timedelta('986 days 09:00:00')
filt = (df['Date'] >= '2019') & (df['Date'] < '2020')#与:%; 或:|;非:-
df.loc[filt]
| Date | Symbol | Open | High | Low | Close | Volume | DayOfWeek |
|---|
| 1749 | 2019-12-31 23:00:00 | ETHUSD | 128.33 | 128.69 | 128.14 | 128.54 | 440678.91 | Tuesday |
|---|
| 1750 | 2019-12-31 22:00:00 | ETHUSD | 128.38 | 128.69 | 127.95 | 128.33 | 554646.02 | Tuesday |
|---|
| 1751 | 2019-12-31 21:00:00 | ETHUSD | 127.86 | 128.43 | 127.72 | 128.38 | 350155.69 | Tuesday |
|---|
| 1752 | 2019-12-31 20:00:00 | ETHUSD | 127.84 | 128.34 | 127.71 | 127.86 | 428183.38 | Tuesday |
|---|
| 1753 | 2019-12-31 19:00:00 | ETHUSD | 128.69 | 128.69 | 127.60 | 127.84 | 1169847.84 | Tuesday |
|---|
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
|---|
| 10504 | 2019-01-01 04:00:00 | ETHUSD | 130.75 | 133.96 | 130.74 | 131.96 | 2791135.37 | Tuesday |
|---|
| 10505 | 2019-01-01 03:00:00 | ETHUSD | 130.06 | 130.79 | 130.06 | 130.75 | 503732.63 | Tuesday |
|---|
| 10506 | 2019-01-01 02:00:00 | ETHUSD | 130.79 | 130.88 | 129.55 | 130.06 | 838183.43 | Tuesday |
|---|
| 10507 | 2019-01-01 01:00:00 | ETHUSD | 131.62 | 131.62 | 130.77 | 130.79 | 434917.99 | Tuesday |
|---|
| 10508 | 2019-01-01 00:00:00 | ETHUSD | 130.53 | 131.91 | 130.48 | 131.62 | 1067136.21 | Tuesday |
|---|
8760 rows × 8 columns
filt = (df['Date'] >= pd.to_datetime('2019-01-01')) & (df['Date'] < pd.to_datetime('2020-01-01'))#与:%; 或:|;非:-
df.loc[filt]
| Date | Symbol | Open | High | Low | Close | Volume | DayOfWeek |
|---|
| 1749 | 2019-12-31 23:00:00 | ETHUSD | 128.33 | 128.69 | 128.14 | 128.54 | 440678.91 | Tuesday |
|---|
| 1750 | 2019-12-31 22:00:00 | ETHUSD | 128.38 | 128.69 | 127.95 | 128.33 | 554646.02 | Tuesday |
|---|
| 1751 | 2019-12-31 21:00:00 | ETHUSD | 127.86 | 128.43 | 127.72 | 128.38 | 350155.69 | Tuesday |
|---|
| 1752 | 2019-12-31 20:00:00 | ETHUSD | 127.84 | 128.34 | 127.71 | 127.86 | 428183.38 | Tuesday |
|---|
| 1753 | 2019-12-31 19:00:00 | ETHUSD | 128.69 | 128.69 | 127.60 | 127.84 | 1169847.84 | Tuesday |
|---|
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
|---|
| 10504 | 2019-01-01 04:00:00 | ETHUSD | 130.75 | 133.96 | 130.74 | 131.96 | 2791135.37 | Tuesday |
|---|
| 10505 | 2019-01-01 03:00:00 | ETHUSD | 130.06 | 130.79 | 130.06 | 130.75 | 503732.63 | Tuesday |
|---|
| 10506 | 2019-01-01 02:00:00 | ETHUSD | 130.79 | 130.88 | 129.55 | 130.06 | 838183.43 | Tuesday |
|---|
| 10507 | 2019-01-01 01:00:00 | ETHUSD | 131.62 | 131.62 | 130.77 | 130.79 | 434917.99 | Tuesday |
|---|
| 10508 | 2019-01-01 00:00:00 | ETHUSD | 130.53 | 131.91 | 130.48 | 131.62 | 1067136.21 | Tuesday |
|---|
8760 rows × 8 columns
df.set_index('Date',inplace=True)
df
| Symbol | Open | High | Low | Close | Volume | DayOfWeek |
|---|
| Date | | | | | | | |
|---|
| 2020-03-13 20:00:00 | ETHUSD | 129.94 | 131.82 | 126.87 | 128.71 | 1940673.93 | Friday |
|---|
| 2020-03-13 19:00:00 | ETHUSD | 119.51 | 132.02 | 117.10 | 129.94 | 7579741.09 | Friday |
|---|
| 2020-03-13 18:00:00 | ETHUSD | 124.47 | 124.85 | 115.50 | 119.51 | 4898735.81 | Friday |
|---|
| 2020-03-13 17:00:00 | ETHUSD | 124.08 | 127.42 | 121.63 | 124.47 | 2753450.92 | Friday |
|---|
| 2020-03-13 16:00:00 | ETHUSD | 124.85 | 129.51 | 120.17 | 124.08 | 4461424.71 | Friday |
|---|
| ... | ... | ... | ... | ... | ... | ... | ... |
|---|
| 2017-07-01 15:00:00 | ETHUSD | 265.74 | 272.74 | 265.00 | 272.57 | 1500282.55 | Saturday |
|---|
| 2017-07-01 14:00:00 | ETHUSD | 268.79 | 269.90 | 265.00 | 265.74 | 1702536.85 | Saturday |
|---|
| 2017-07-01 13:00:00 | ETHUSD | 274.83 | 274.93 | 265.00 | 268.79 | 3010787.99 | Saturday |
|---|
| 2017-07-01 12:00:00 | ETHUSD | 275.01 | 275.01 | 271.00 | 274.83 | 824362.87 | Saturday |
|---|
| 2017-07-01 11:00:00 | ETHUSD | 279.98 | 279.99 | 272.10 | 275.01 | 679358.87 | Saturday |
|---|
23674 rows × 7 columns
df.loc['2019']
| Symbol | Open | High | Low | Close | Volume | DayOfWeek |
|---|
| Date | | | | | | | |
|---|
| 2019-12-31 23:00:00 | ETHUSD | 128.33 | 128.69 | 128.14 | 128.54 | 440678.91 | Tuesday |
|---|
| 2019-12-31 22:00:00 | ETHUSD | 128.38 | 128.69 | 127.95 | 128.33 | 554646.02 | Tuesday |
|---|
| 2019-12-31 21:00:00 | ETHUSD | 127.86 | 128.43 | 127.72 | 128.38 | 350155.69 | Tuesday |
|---|
| 2019-12-31 20:00:00 | ETHUSD | 127.84 | 128.34 | 127.71 | 127.86 | 428183.38 | Tuesday |
|---|
| 2019-12-31 19:00:00 | ETHUSD | 128.69 | 128.69 | 127.60 | 127.84 | 1169847.84 | Tuesday |
|---|
| ... | ... | ... | ... | ... | ... | ... | ... |
|---|
| 2019-01-01 04:00:00 | ETHUSD | 130.75 | 133.96 | 130.74 | 131.96 | 2791135.37 | Tuesday |
|---|
| 2019-01-01 03:00:00 | ETHUSD | 130.06 | 130.79 | 130.06 | 130.75 | 503732.63 | Tuesday |
|---|
| 2019-01-01 02:00:00 | ETHUSD | 130.79 | 130.88 | 129.55 | 130.06 | 838183.43 | Tuesday |
|---|
| 2019-01-01 01:00:00 | ETHUSD | 131.62 | 131.62 | 130.77 | 130.79 | 434917.99 | Tuesday |
|---|
| 2019-01-01 00:00:00 | ETHUSD | 130.53 | 131.91 | 130.48 | 131.62 | 1067136.21 | Tuesday |
|---|
8760 rows × 7 columns
df.loc['2020-01':'2020-02']
| Symbol | Open | High | Low | Close | Volume | DayOfWeek |
|---|
| Date | | | | | | | |
|---|
| 2020-02-29 23:00:00 | ETHUSD | 223.35 | 223.58 | 216.83 | 217.31 | 1927939.88 | Saturday |
|---|
| 2020-02-29 22:00:00 | ETHUSD | 223.48 | 223.59 | 222.14 | 223.35 | 535998.57 | Saturday |
|---|
| 2020-02-29 21:00:00 | ETHUSD | 224.63 | 225.14 | 222.74 | 223.48 | 561158.03 | Saturday |
|---|
| 2020-02-29 20:00:00 | ETHUSD | 225.31 | 225.33 | 223.50 | 224.63 | 511648.65 | Saturday |
|---|
| 2020-02-29 19:00:00 | ETHUSD | 225.09 | 225.85 | 223.87 | 225.31 | 1250856.20 | Saturday |
|---|
| ... | ... | ... | ... | ... | ... | ... | ... |
|---|
| 2020-01-01 04:00:00 | ETHUSD | 129.57 | 130.00 | 129.50 | 129.56 | 702786.82 | Wednesday |
|---|
| 2020-01-01 03:00:00 | ETHUSD | 130.37 | 130.44 | 129.38 | 129.57 | 496704.23 | Wednesday |
|---|
| 2020-01-01 02:00:00 | ETHUSD | 130.14 | 130.50 | 129.91 | 130.37 | 396315.72 | Wednesday |
|---|
| 2020-01-01 01:00:00 | ETHUSD | 128.34 | 130.14 | 128.32 | 130.14 | 635419.40 | Wednesday |
|---|
| 2020-01-01 00:00:00 | ETHUSD | 128.54 | 128.54 | 128.12 | 128.34 | 245119.91 | Wednesday |
|---|
1440 rows × 7 columns
df.loc['2020-01':'2020-02']['Close'].mean()
195.16559027777814
df.loc['2020-01-01']['High'].max()
132.68
df['High'].resample('D').max()
Date
2017-07-01 279.99
2017-07-02 293.73
2017-07-03 285.00
2017-07-04 282.83
2017-07-05 274.97
...
2020-03-09 208.65
2020-03-10 206.28
2020-03-11 202.98
2020-03-12 195.64
2020-03-13 148.00
Freq: D, Name: High, Length: 987, dtype: float64
highs = df['High'].resample('D').max()
highs['2020-01-01']
132.68
%matplotlib inline
highs.plot()
df.resample('W').mean()
| Open | High | Low | Close | Volume |
|---|
| Date | | | | | |
|---|
| 2017-07-02 | 268.066486 | 271.124595 | 264.819730 | 268.202162 | 2.185035e+06 |
|---|
| 2017-07-09 | 261.337024 | 262.872917 | 259.186190 | 261.062083 | 1.337349e+06 |
|---|
| 2017-07-16 | 196.193214 | 199.204405 | 192.722321 | 195.698393 | 2.986756e+06 |
|---|
| 2017-07-23 | 212.351429 | 215.779286 | 209.126310 | 212.783750 | 4.298593e+06 |
|---|
| 2017-07-30 | 203.496190 | 205.110357 | 201.714048 | 203.309524 | 1.581729e+06 |
|---|
| ... | ... | ... | ... | ... | ... |
|---|
| 2020-02-16 | 255.021667 | 257.255238 | 252.679762 | 255.198452 | 2.329087e+06 |
|---|
| 2020-02-23 | 265.220833 | 267.263690 | 262.948512 | 265.321905 | 1.826094e+06 |
|---|
| 2020-03-01 | 236.720536 | 238.697500 | 234.208750 | 236.373988 | 2.198762e+06 |
|---|
| 2020-03-08 | 229.923571 | 231.284583 | 228.373810 | 229.817619 | 1.628910e+06 |
|---|
| 2020-03-15 | 176.937521 | 179.979487 | 172.936239 | 176.332821 | 4.259828e+06 |
|---|
142 rows × 5 columns
df.resample('W').agg({'Close':'mean','High':'max','Low':'min','Volume':'sum'})
| Close | High | Low | Volume |
|---|
| Date | | | | |
|---|
| 2017-07-02 | 268.202162 | 293.73 | 253.23 | 8.084631e+07 |
|---|
| 2017-07-09 | 261.062083 | 285.00 | 231.25 | 2.246746e+08 |
|---|
| 2017-07-16 | 195.698393 | 240.33 | 130.26 | 5.017750e+08 |
|---|
| 2017-07-23 | 212.783750 | 249.40 | 153.25 | 7.221637e+08 |
|---|
| 2017-07-30 | 203.309524 | 229.99 | 178.03 | 2.657305e+08 |
|---|
| ... | ... | ... | ... | ... |
|---|
| 2020-02-16 | 255.198452 | 290.00 | 216.31 | 3.912867e+08 |
|---|
| 2020-02-23 | 265.321905 | 287.13 | 242.36 | 3.067838e+08 |
|---|
| 2020-03-01 | 236.373988 | 278.13 | 209.26 | 3.693920e+08 |
|---|
| 2020-03-08 | 229.817619 | 253.01 | 196.00 | 2.736569e+08 |
|---|
| 2020-03-15 | 176.332821 | 208.65 | 90.00 | 4.983998e+08 |
|---|
142 rows × 4 columns
小节
resample和groupby类似,只不过resample用时间分组,例如一天的数据为一组,一周数据为一组
series的dt属性有year、month、day属性
读入写入数据
df = pd.read_csv('data/survey_results_public.csv',index_col='Respondent',na_values=na_vals)#na_values
schema_df = pd.read_csv('data/survey_results_schema.csv',index_col='Column')
pd.set_option('display.max_columns', 85)
pd.set_option('display.max_rows', 85)
df.head(3)
| MainBranch | Hobbyist | OpenSourcer | OpenSource | Employment | Country | Student | EdLevel | UndergradMajor | EduOther | OrgSize | DevType | YearsCode | Age1stCode | YearsCodePro | CareerSat | JobSat | MgrIdiot | MgrMoney | MgrWant | JobSeek | LastHireDate | LastInt | FizzBuzz | JobFactors | ResumeUpdate | CurrencySymbol | CurrencyDesc | CompTotal | CompFreq | ConvertedComp | WorkWeekHrs | WorkPlan | WorkChallenge | WorkRemote | WorkLoc | ImpSyn | CodeRev | CodeRevHrs | UnitTests | PurchaseHow | PurchaseWhat | LanguageWorkedWith | LanguageDesireNextYear | DatabaseWorkedWith | DatabaseDesireNextYear | PlatformWorkedWith | PlatformDesireNextYear | WebFrameWorkedWith | WebFrameDesireNextYear | MiscTechWorkedWith | MiscTechDesireNextYear | DevEnviron | OpSys | Containers | BlockchainOrg | BlockchainIs | BetterLife | ITperson | OffOn | SocialMedia | Extraversion | ScreenName | SOVisit1st | SOVisitFreq | SOVisitTo | SOFindAnswer | SOTimeSaved | SOHowMuchTime | SOAccount | SOPartFreq | SOJobs | EntTeams | SOComm | WelcomeChange | SONewContent | Age | Gender | Trans | Sexuality | Ethnicity | Dependents | SurveyLength | SurveyEase |
|---|
| Respondent | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|---|
| 1 | I am a student who is learning to code | Yes | Never | The quality of OSS and closed source software ... | Not employed, and not looking for work | United Kingdom | No | Primary/elementary school | NaN | Taught yourself a new language, framework, or ... | NaN | NaN | 4 | 10 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | HTML/CSS;Java;JavaScript;Python | C;C++;C#;Go;HTML/CSS;Java;JavaScript;Python;SQL | SQLite | MySQL | MacOS;Windows | Android;Arduino;Windows | Django;Flask | Flask;jQuery | Node.js | Node.js | IntelliJ;Notepad++;PyCharm | Windows | I do not use containers | NaN | NaN | Yes | Fortunately, someone else has that title | Yes | Twitter | Online | Username | 2017 | A few times per month or weekly | Find answers to specific questions;Learn how t... | 3-5 times per week | Stack Overflow was much faster | 31-60 minutes | No | NaN | No, I didn't know that Stack Overflow had a jo... | No, and I don't know what those are | Neutral | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 14.0 | Man | No | Straight / Heterosexual | NaN | No | Appropriate in length | Neither easy nor difficult |
|---|
| 2 | I am a student who is learning to code | No | Less than once per year | The quality of OSS and closed source software ... | Not employed, but looking for work | Bosnia and Herzegovina | Yes, full-time | Secondary school (e.g. American high school, G... | NaN | Taken an online course in programming or softw... | NaN | Developer, desktop or enterprise applications;... | NaN | 17 | NaN | NaN | NaN | NaN | NaN | NaN | I am actively looking for a job | I've never had a job | NaN | NaN | Financial performance or funding status of the... | Something else changed (education, award, medi... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | C++;HTML/CSS;Python | C++;HTML/CSS;JavaScript;SQL | NaN | MySQL | Windows | Windows | Django | Django | NaN | NaN | Atom;PyCharm | Windows | I do not use containers | NaN | Useful across many domains and could change ma... | Yes | Yes | Yes | Instagram | Online | Username | 2017 | Daily or almost daily | Find answers to specific questions;Learn how t... | 3-5 times per week | Stack Overflow was much faster | 11-30 minutes | Yes | A few times per month or weekly | No, I knew that Stack Overflow had a job board... | No, and I don't know what those are | Yes, somewhat | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 19.0 | Man | No | Straight / Heterosexual | NaN | No | Appropriate in length | Neither easy nor difficult |
|---|
| 3 | I am not primarily a developer, but I write co... | Yes | Never | The quality of OSS and closed source software ... | Employed full-time | Thailand | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Web development or web design | Taught yourself a new language, framework, or ... | 100 to 499 employees | Designer;Developer, back-end;Developer, front-... | 3 | 22 | 1 | Slightly satisfied | Slightly satisfied | Not at all confident | Not sure | Not sure | I’m not actively looking, but I am open to new... | 1-2 years ago | Interview with people in peer roles | No | Languages, frameworks, and other technologies ... | I was preparing for a job search | THB | Thai baht | 23000.0 | Monthly | 8820.0 | 40.0 | There's no schedule or spec; I work on what se... | Distracting work environment;Inadequate access... | Less than once per month / Never | Home | Average | No | NaN | No, but I think we should | Not sure | I have little or no influence | HTML/CSS | Elixir;HTML/CSS | PostgreSQL | PostgreSQL | NaN | NaN | NaN | Other(s): | NaN | NaN | Vim;Visual Studio Code | Linux-based | I do not use containers | NaN | NaN | Yes | Yes | Yes | Reddit | In real life (in person) | Username | 2011 | A few times per week | Find answers to specific questions;Learn how t... | 6-10 times per week | They were about the same | NaN | Yes | Less than once per month or monthly | Yes | No, I've heard of them, but I am not part of a... | Neutral | Just as welcome now as I felt last year | Tech meetups or events in your area;Courses on... | 28.0 | Man | No | Straight / Heterosexual | NaN | Yes | Appropriate in length | Neither easy nor difficult |
|---|
filt = (df['Country']=='India')
India_df = df.loc[filt]
India_df.head()
| MainBranch | Hobbyist | OpenSourcer | OpenSource | Employment | Country | Student | EdLevel | UndergradMajor | EduOther | OrgSize | DevType | YearsCode | Age1stCode | YearsCodePro | CareerSat | JobSat | MgrIdiot | MgrMoney | MgrWant | JobSeek | LastHireDate | LastInt | FizzBuzz | JobFactors | ResumeUpdate | CurrencySymbol | CurrencyDesc | CompTotal | CompFreq | ConvertedComp | WorkWeekHrs | WorkPlan | WorkChallenge | WorkRemote | WorkLoc | ImpSyn | CodeRev | CodeRevHrs | UnitTests | PurchaseHow | PurchaseWhat | LanguageWorkedWith | LanguageDesireNextYear | DatabaseWorkedWith | DatabaseDesireNextYear | PlatformWorkedWith | PlatformDesireNextYear | WebFrameWorkedWith | WebFrameDesireNextYear | MiscTechWorkedWith | MiscTechDesireNextYear | DevEnviron | OpSys | Containers | BlockchainOrg | BlockchainIs | BetterLife | ITperson | OffOn | SocialMedia | Extraversion | ScreenName | SOVisit1st | SOVisitFreq | SOVisitTo | SOFindAnswer | SOTimeSaved | SOHowMuchTime | SOAccount | SOPartFreq | SOJobs | EntTeams | SOComm | WelcomeChange | SONewContent | Age | Gender | Trans | Sexuality | Ethnicity | Dependents | SurveyLength | SurveyEase |
|---|
| Respondent | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|---|
| 8 | I code primarily as a hobby | Yes | Less than once per year | OSS is, on average, of HIGHER quality than pro... | Not employed, but looking for work | India | NaN | Bachelor’s degree (BA, BS, B.Eng., etc.) | Computer science, computer engineering, or sof... | Taught yourself a new language, framework, or ... | NaN | Developer, back-end;Engineer, site reliability | 8 | 16 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Bash/Shell/PowerShell;C;C++;HTML/CSS;Java;Java... | Bash/Shell/PowerShell;C;C++;Elixir;Erlang;Go;P... | Cassandra;Elasticsearch;MongoDB;MySQL;Oracle;R... | Cassandra;DynamoDB;Elasticsearch;Firebase;Mong... | AWS;Docker;Heroku;Linux;MacOS;Slack | Android;Arduino;AWS;Docker;Google Cloud Platfo... | Express;Flask;React.js;Spring | Django;Express;Flask;React.js;Vue.js | Hadoop;Node.js;Pandas | Ansible;Apache Spark;Chef;Hadoop;Node.js;Panda... | Atom;IntelliJ;IPython / Jupyter;PyCharm;Visual... | Linux-based | Development;Testing;Production;Outside of work... | NaN | Useful across many domains and could change ma... | Yes | SIGH | Yes | YouTube | In real life (in person) | Handle | 2012 | A few times per week | Find answers to specific questions;Learn how t... | Less than once per week | Stack Overflow was slightly faster | 11-30 minutes | Yes | Less than once per month or monthly | Yes | No, and I don't know what those are | Yes, definitely | A lot more welcome now than last year | Tech articles written by other developers;Indu... | 24.0 | Man | No | Straight / Heterosexual | NaN | NaN | Appropriate in length | Neither easy nor difficult |
|---|
| 10 | I am a developer by profession | Yes | Once a month or more often | OSS is, on average, of HIGHER quality than pro... | Employed full-time | India | No | Master’s degree (MA, MS, M.Eng., MBA, etc.) | NaN | NaN | 10,000 or more employees | Data or business analyst;Data scientist or mac... | 12 | 20 | 10 | Slightly dissatisfied | Slightly dissatisfied | Somewhat confident | Yes | Yes | I’m not actively looking, but I am open to new... | 3-4 years ago | NaN | No | Languages, frameworks, and other technologies ... | NaN | INR | Indian rupee | 950000.0 | Yearly | 13293.0 | 70.0 | There's no schedule or spec; I work on what se... | NaN | A few days each month | Home | Far above average | Yes, because I see value in code review | 4.0 | Yes, it's part of our process | NaN | NaN | C#;Go;JavaScript;Python;R;SQL | C#;Go;JavaScript;Kotlin;Python;R;SQL | Elasticsearch;MongoDB;Microsoft SQL Server;MyS... | Elasticsearch;MongoDB;Microsoft SQL Server | Linux;Windows | Android;Linux;Raspberry Pi;Windows | Angular/Angular.js;ASP.NET;Django;Express;Flas... | Angular/Angular.js;ASP.NET;Django;Express;Flas... | .NET;Node.js;Pandas;Torch/PyTorch | .NET;Node.js;TensorFlow;Torch/PyTorch | Android Studio;Eclipse;IPython / Jupyter;Notep... | Windows | NaN | Not at all | Useful for immutable record keeping outside of... | No | Yes | Yes | YouTube | Neither | Screen Name | NaN | Multiple times per day | Find answers to specific questions;Get a sense... | 3-5 times per week | They were about the same | NaN | Yes | A few times per month or weekly | Yes | No, and I don't know what those are | Yes, somewhat | Somewhat less welcome now than last year | Tech articles written by other developers;Tech... | NaN | NaN | NaN | NaN | NaN | Yes | Too long | Difficult |
|---|
| 15 | I am a student who is learning to code | Yes | Never | OSS is, on average, of HIGHER quality than pro... | Not employed, but looking for work | India | Yes, full-time | Secondary school (e.g. American high school, G... | NaN | Taken an online course in programming or softw... | NaN | Student | 3 | 13 | NaN | NaN | NaN | NaN | NaN | NaN | I’m not actively looking, but I am open to new... | I've never had a job | NaN | NaN | Industry that I'd be working in;Languages, fra... | Something else changed (education, award, medi... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Assembly;Bash/Shell/PowerShell;C;C++;HTML/CSS;... | Assembly;Bash/Shell/PowerShell;C;C++;C#;Go;HTM... | MariaDB;MySQL;Oracle;SQLite | MariaDB;MongoDB;Microsoft SQL Server;MySQL;Ora... | Linux;Windows | Android;Google Cloud Platform;iOS;Linux;MacOS;... | NaN | Angular/Angular.js;ASP.NET;Django;Drupal;jQuer... | NaN | .NET;.NET Core;Node.js;TensorFlow;Unity 3D;Unr... | Atom;NetBeans;Notepad++;Sublime Text;Vim | Linux-based | Development | NaN | NaN | Yes | Yes | What? | YouTube | In real life (in person) | NaN | 2018 | Daily or almost daily | Find answers to specific questions;Learn how t... | More than 10 times per week | They were about the same | NaN | Yes | Less than once per month or monthly | Yes | No, I've heard of them, but I am not part of a... | Yes, somewhat | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 20.0 | Man | No | NaN | NaN | Yes | Too long | Neither easy nor difficult |
|---|
| 50 | I am a developer by profession | Yes | Once a month or more often | OSS is, on average, of LOWER quality than prop... | Employed full-time | India | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Another engineering discipline (ex. civil, ele... | Received on-the-job training in software devel... | 10,000 or more employees | Developer, back-end;DevOps specialist | 7 | 15 | 2 | Slightly satisfied | Very satisfied | Very confident | Not sure | Yes | I’m not actively looking, but I am open to new... | 1-2 years ago | Write code by hand (e.g., on a whiteboard);Int... | No | Specific department or team I'd be working on;... | I was preparing for a job search | INR | Indian rupee | 400000.0 | Yearly | 5597.0 | 7.0 | There is a schedule and/or spec (made by me or... | Meetings;Time spent commuting | Less than once per month / Never | Other place, such as a coworking space or cafe | Average | No | NaN | Yes, it's not part of our process but the deve... | The CTO, CIO, or other management purchase new... | I have little or no influence | Bash/Shell/PowerShell;C;C++;HTML/CSS;Java;Java... | HTML/CSS;JavaScript;Python | Elasticsearch;Firebase;MariaDB;MongoDB;MySQL;O... | Firebase;PostgreSQL;Redis;Other(s): | Arduino;AWS;Heroku;Linux;MacOS;Raspberry Pi;Wo... | AWS;Docker;Heroku;Kubernetes;Linux;MacOS;WordP... | Django;Express;Flask;jQuery | Express;Flask;jQuery;React.js;Vue.js | Node.js | Node.js | Notepad++;Visual Studio Code | MacOS | Testing | Not at all | Useful for immutable record keeping outside of... | Yes | Also Yes | What? | YouTube | In real life (in person) | Username | 2012 | Daily or almost daily | Find answers to specific questions;Learn how t... | 3-5 times per week | Stack Overflow was slightly faster | 11-30 minutes | Yes | Less than once per month or monthly | No, I knew that Stack Overflow had a job board... | No, and I don't know what those are | Yes, definitely | Just as welcome now as I felt last year | Tech articles written by other developers;Tech... | 23.0 | Man | No | NaN | South Asian | No | Too long | Easy |
|---|
| 65 | I am a developer by profession | Yes | Never | NaN | Employed full-time | India | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Information systems, information technology, o... | NaN | 20 to 99 employees | Developer, front-end;Developer, mobile | 2 | 17 | 2 | Very satisfied | Very satisfied | Very confident | No | Not sure | I’m not actively looking, but I am open to new... | Less than a year ago | Write any code;Solve a brain-teaser style puzz... | No | Languages, frameworks, and other technologies ... | My job status changed (promotion, new job, etc.) | INR | Indian rupee | NaN | Monthly | NaN | 48.0 | There's no schedule or spec; I work on what se... | NaN | About half the time | Office | Average | Yes, because I see value in code review | NaN | Yes, it's not part of our process but the deve... | Not sure | NaN | Assembly;C;C++;C#;HTML/CSS;Java | Kotlin | Firebase;MySQL;Oracle;SQLite | Firebase;SQLite | Android | Android | ASP.NET | NaN | NaN | NaN | Android Studio;IntelliJ | Linux-based | NaN | NaN | NaN | Yes | Yes | What? | WhatsApp | In real life (in person) | NaN | 2017 | Multiple times per day | Find answers to specific questions | More than 10 times per week | Stack Overflow was slightly faster | 11-30 minutes | Yes | A few times per week | No, I knew that Stack Overflow had a job board... | No, and I don't know what those are | Not sure | A lot more welcome now than last year | NaN | 21.0 | Man | No | NaN | NaN | Yes | Appropriate in length | Neither easy nor difficult |
|---|
India_df.to_csv('data/modified.csv')
India_df.to_csv('data/modified.tsv',sep='\t')
India_df.to_excel('data/modified.xlsx')
test = pd.read_excel('data/modified.xlsx',index_col='Respondent')
test.head()
| MainBranch | Hobbyist | OpenSourcer | OpenSource | Employment | Country | Student | EdLevel | UndergradMajor | EduOther | OrgSize | DevType | YearsCode | Age1stCode | YearsCodePro | CareerSat | JobSat | MgrIdiot | MgrMoney | MgrWant | JobSeek | LastHireDate | LastInt | FizzBuzz | JobFactors | ResumeUpdate | CurrencySymbol | CurrencyDesc | CompTotal | CompFreq | ConvertedComp | WorkWeekHrs | WorkPlan | WorkChallenge | WorkRemote | WorkLoc | ImpSyn | CodeRev | CodeRevHrs | UnitTests | PurchaseHow | PurchaseWhat | LanguageWorkedWith | LanguageDesireNextYear | DatabaseWorkedWith | DatabaseDesireNextYear | PlatformWorkedWith | PlatformDesireNextYear | WebFrameWorkedWith | WebFrameDesireNextYear | MiscTechWorkedWith | MiscTechDesireNextYear | DevEnviron | OpSys | Containers | BlockchainOrg | BlockchainIs | BetterLife | ITperson | OffOn | SocialMedia | Extraversion | ScreenName | SOVisit1st | SOVisitFreq | SOVisitTo | SOFindAnswer | SOTimeSaved | SOHowMuchTime | SOAccount | SOPartFreq | SOJobs | EntTeams | SOComm | WelcomeChange | SONewContent | Age | Gender | Trans | Sexuality | Ethnicity | Dependents | SurveyLength | SurveyEase |
|---|
| Respondent | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|---|
| 8 | I code primarily as a hobby | Yes | Less than once per year | OSS is, on average, of HIGHER quality than pro... | Not employed, but looking for work | India | NaN | Bachelor’s degree (BA, BS, B.Eng., etc.) | Computer science, computer engineering, or sof... | Taught yourself a new language, framework, or ... | NaN | Developer, back-end;Engineer, site reliability | 8 | 16 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Bash/Shell/PowerShell;C;C++;HTML/CSS;Java;Java... | Bash/Shell/PowerShell;C;C++;Elixir;Erlang;Go;P... | Cassandra;Elasticsearch;MongoDB;MySQL;Oracle;R... | Cassandra;DynamoDB;Elasticsearch;Firebase;Mong... | AWS;Docker;Heroku;Linux;MacOS;Slack | Android;Arduino;AWS;Docker;Google Cloud Platfo... | Express;Flask;React.js;Spring | Django;Express;Flask;React.js;Vue.js | Hadoop;Node.js;Pandas | Ansible;Apache Spark;Chef;Hadoop;Node.js;Panda... | Atom;IntelliJ;IPython / Jupyter;PyCharm;Visual... | Linux-based | Development;Testing;Production;Outside of work... | NaN | Useful across many domains and could change ma... | Yes | SIGH | Yes | YouTube | In real life (in person) | Handle | 2012 | A few times per week | Find answers to specific questions;Learn how t... | Less than once per week | Stack Overflow was slightly faster | 11-30 minutes | Yes | Less than once per month or monthly | Yes | No, and I don't know what those are | Yes, definitely | A lot more welcome now than last year | Tech articles written by other developers;Indu... | 24.0 | Man | No | Straight / Heterosexual | NaN | NaN | Appropriate in length | Neither easy nor difficult |
|---|
| 10 | I am a developer by profession | Yes | Once a month or more often | OSS is, on average, of HIGHER quality than pro... | Employed full-time | India | No | Master’s degree (MA, MS, M.Eng., MBA, etc.) | NaN | NaN | 10,000 or more employees | Data or business analyst;Data scientist or mac... | 12 | 20 | 10 | Slightly dissatisfied | Slightly dissatisfied | Somewhat confident | Yes | Yes | I’m not actively looking, but I am open to new... | 3-4 years ago | NaN | No | Languages, frameworks, and other technologies ... | NaN | INR | Indian rupee | 950000.0 | Yearly | 13293.0 | 70.0 | There's no schedule or spec; I work on what se... | NaN | A few days each month | Home | Far above average | Yes, because I see value in code review | 4.0 | Yes, it's part of our process | NaN | NaN | C#;Go;JavaScript;Python;R;SQL | C#;Go;JavaScript;Kotlin;Python;R;SQL | Elasticsearch;MongoDB;Microsoft SQL Server;MyS... | Elasticsearch;MongoDB;Microsoft SQL Server | Linux;Windows | Android;Linux;Raspberry Pi;Windows | Angular/Angular.js;ASP.NET;Django;Express;Flas... | Angular/Angular.js;ASP.NET;Django;Express;Flas... | .NET;Node.js;Pandas;Torch/PyTorch | .NET;Node.js;TensorFlow;Torch/PyTorch | Android Studio;Eclipse;IPython / Jupyter;Notep... | Windows | NaN | Not at all | Useful for immutable record keeping outside of... | No | Yes | Yes | YouTube | Neither | Screen Name | NaN | Multiple times per day | Find answers to specific questions;Get a sense... | 3-5 times per week | They were about the same | NaN | Yes | A few times per month or weekly | Yes | No, and I don't know what those are | Yes, somewhat | Somewhat less welcome now than last year | Tech articles written by other developers;Tech... | NaN | NaN | NaN | NaN | NaN | Yes | Too long | Difficult |
|---|
| 15 | I am a student who is learning to code | Yes | Never | OSS is, on average, of HIGHER quality than pro... | Not employed, but looking for work | India | Yes, full-time | Secondary school (e.g. American high school, G... | NaN | Taken an online course in programming or softw... | NaN | Student | 3 | 13 | NaN | NaN | NaN | NaN | NaN | NaN | I’m not actively looking, but I am open to new... | I've never had a job | NaN | NaN | Industry that I'd be working in;Languages, fra... | Something else changed (education, award, medi... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Assembly;Bash/Shell/PowerShell;C;C++;HTML/CSS;... | Assembly;Bash/Shell/PowerShell;C;C++;C#;Go;HTM... | MariaDB;MySQL;Oracle;SQLite | MariaDB;MongoDB;Microsoft SQL Server;MySQL;Ora... | Linux;Windows | Android;Google Cloud Platform;iOS;Linux;MacOS;... | NaN | Angular/Angular.js;ASP.NET;Django;Drupal;jQuer... | NaN | .NET;.NET Core;Node.js;TensorFlow;Unity 3D;Unr... | Atom;NetBeans;Notepad++;Sublime Text;Vim | Linux-based | Development | NaN | NaN | Yes | Yes | What? | YouTube | In real life (in person) | NaN | 2018 | Daily or almost daily | Find answers to specific questions;Learn how t... | More than 10 times per week | They were about the same | NaN | Yes | Less than once per month or monthly | Yes | No, I've heard of them, but I am not part of a... | Yes, somewhat | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 20.0 | Man | No | NaN | NaN | Yes | Too long | Neither easy nor difficult |
|---|
| 50 | I am a developer by profession | Yes | Once a month or more often | OSS is, on average, of LOWER quality than prop... | Employed full-time | India | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Another engineering discipline (ex. civil, ele... | Received on-the-job training in software devel... | 10,000 or more employees | Developer, back-end;DevOps specialist | 7 | 15 | 2 | Slightly satisfied | Very satisfied | Very confident | Not sure | Yes | I’m not actively looking, but I am open to new... | 1-2 years ago | Write code by hand (e.g., on a whiteboard);Int... | No | Specific department or team I'd be working on;... | I was preparing for a job search | INR | Indian rupee | 400000.0 | Yearly | 5597.0 | 7.0 | There is a schedule and/or spec (made by me or... | Meetings;Time spent commuting | Less than once per month / Never | Other place, such as a coworking space or cafe | Average | No | NaN | Yes, it's not part of our process but the deve... | The CTO, CIO, or other management purchase new... | I have little or no influence | Bash/Shell/PowerShell;C;C++;HTML/CSS;Java;Java... | HTML/CSS;JavaScript;Python | Elasticsearch;Firebase;MariaDB;MongoDB;MySQL;O... | Firebase;PostgreSQL;Redis;Other(s): | Arduino;AWS;Heroku;Linux;MacOS;Raspberry Pi;Wo... | AWS;Docker;Heroku;Kubernetes;Linux;MacOS;WordP... | Django;Express;Flask;jQuery | Express;Flask;jQuery;React.js;Vue.js | Node.js | Node.js | Notepad++;Visual Studio Code | MacOS | Testing | Not at all | Useful for immutable record keeping outside of... | Yes | Also Yes | What? | YouTube | In real life (in person) | Username | 2012 | Daily or almost daily | Find answers to specific questions;Learn how t... | 3-5 times per week | Stack Overflow was slightly faster | 11-30 minutes | Yes | Less than once per month or monthly | No, I knew that Stack Overflow had a job board... | No, and I don't know what those are | Yes, definitely | Just as welcome now as I felt last year | Tech articles written by other developers;Tech... | 23.0 | Man | No | NaN | South Asian | No | Too long | Easy |
|---|
| 65 | I am a developer by profession | Yes | Never | NaN | Employed full-time | India | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Information systems, information technology, o... | NaN | 20 to 99 employees | Developer, front-end;Developer, mobile | 2 | 17 | 2 | Very satisfied | Very satisfied | Very confident | No | Not sure | I’m not actively looking, but I am open to new... | Less than a year ago | Write any code;Solve a brain-teaser style puzz... | No | Languages, frameworks, and other technologies ... | My job status changed (promotion, new job, etc.) | INR | Indian rupee | NaN | Monthly | NaN | 48.0 | There's no schedule or spec; I work on what se... | NaN | About half the time | Office | Average | Yes, because I see value in code review | NaN | Yes, it's not part of our process but the deve... | Not sure | NaN | Assembly;C;C++;C#;HTML/CSS;Java | Kotlin | Firebase;MySQL;Oracle;SQLite | Firebase;SQLite | Android | Android | ASP.NET | NaN | NaN | NaN | Android Studio;IntelliJ | Linux-based | NaN | NaN | NaN | Yes | Yes | What? | WhatsApp | In real life (in person) | NaN | 2017 | Multiple times per day | Find answers to specific questions | More than 10 times per week | Stack Overflow was slightly faster | 11-30 minutes | Yes | A few times per week | No, I knew that Stack Overflow had a job board... | No, and I don't know what those are | Not sure | A lot more welcome now than last year | NaN | 21.0 | Man | No | NaN | NaN | Yes | Appropriate in length | Neither easy nor difficult |
|---|
India_df.to_json('data/modified.json',orient='records',lines=True)
test = pd.read_json('data/modified.json',orient='records',lines=True)
test.head()
| MainBranch | Hobbyist | OpenSourcer | OpenSource | Employment | Country | Student | EdLevel | UndergradMajor | EduOther | OrgSize | DevType | YearsCode | Age1stCode | YearsCodePro | CareerSat | JobSat | MgrIdiot | MgrMoney | MgrWant | JobSeek | LastHireDate | LastInt | FizzBuzz | JobFactors | ResumeUpdate | CurrencySymbol | CurrencyDesc | CompTotal | CompFreq | ConvertedComp | WorkWeekHrs | WorkPlan | WorkChallenge | WorkRemote | WorkLoc | ImpSyn | CodeRev | CodeRevHrs | UnitTests | PurchaseHow | PurchaseWhat | LanguageWorkedWith | LanguageDesireNextYear | DatabaseWorkedWith | DatabaseDesireNextYear | PlatformWorkedWith | PlatformDesireNextYear | WebFrameWorkedWith | WebFrameDesireNextYear | MiscTechWorkedWith | MiscTechDesireNextYear | DevEnviron | OpSys | Containers | BlockchainOrg | BlockchainIs | BetterLife | ITperson | OffOn | SocialMedia | Extraversion | ScreenName | SOVisit1st | SOVisitFreq | SOVisitTo | SOFindAnswer | SOTimeSaved | SOHowMuchTime | SOAccount | SOPartFreq | SOJobs | EntTeams | SOComm | WelcomeChange | SONewContent | Age | Gender | Trans | Sexuality | Ethnicity | Dependents | SurveyLength | SurveyEase |
|---|
| 0 | I code primarily as a hobby | Yes | Less than once per year | OSS is, on average, of HIGHER quality than pro... | Not employed, but looking for work | India | None | Bachelor’s degree (BA, BS, B.Eng., etc.) | Computer science, computer engineering, or sof... | Taught yourself a new language, framework, or ... | None | Developer, back-end;Engineer, site reliability | 8 | 16 | None | None | None | None | None | None | None | None | None | None | None | None | None | None | NaN | None | NaN | NaN | None | None | None | None | None | None | NaN | None | None | None | Bash/Shell/PowerShell;C;C++;HTML/CSS;Java;Java... | Bash/Shell/PowerShell;C;C++;Elixir;Erlang;Go;P... | Cassandra;Elasticsearch;MongoDB;MySQL;Oracle;R... | Cassandra;DynamoDB;Elasticsearch;Firebase;Mong... | AWS;Docker;Heroku;Linux;MacOS;Slack | Android;Arduino;AWS;Docker;Google Cloud Platfo... | Express;Flask;React.js;Spring | Django;Express;Flask;React.js;Vue.js | Hadoop;Node.js;Pandas | Ansible;Apache Spark;Chef;Hadoop;Node.js;Panda... | Atom;IntelliJ;IPython / Jupyter;PyCharm;Visual... | Linux-based | Development;Testing;Production;Outside of work... | None | Useful across many domains and could change ma... | Yes | SIGH | Yes | YouTube | In real life (in person) | Handle | 2012 | A few times per week | Find answers to specific questions;Learn how t... | Less than once per week | Stack Overflow was slightly faster | 11-30 minutes | Yes | Less than once per month or monthly | Yes | No, and I don't know what those are | Yes, definitely | A lot more welcome now than last year | Tech articles written by other developers;Indu... | 24.0 | Man | No | Straight / Heterosexual | None | None | Appropriate in length | Neither easy nor difficult |
|---|
| 1 | I am a developer by profession | Yes | Once a month or more often | OSS is, on average, of HIGHER quality than pro... | Employed full-time | India | No | Master’s degree (MA, MS, M.Eng., MBA, etc.) | None | None | 10,000 or more employees | Data or business analyst;Data scientist or mac... | 12 | 20 | 10 | Slightly dissatisfied | Slightly dissatisfied | Somewhat confident | Yes | Yes | I’m not actively looking, but I am open to new... | 3-4 years ago | None | No | Languages, frameworks, and other technologies ... | None | INR | Indian rupee | 950000.0 | Yearly | 13293.0 | 70.0 | There's no schedule or spec; I work on what se... | None | A few days each month | Home | Far above average | Yes, because I see value in code review | 4.0 | Yes, it's part of our process | None | None | C#;Go;JavaScript;Python;R;SQL | C#;Go;JavaScript;Kotlin;Python;R;SQL | Elasticsearch;MongoDB;Microsoft SQL Server;MyS... | Elasticsearch;MongoDB;Microsoft SQL Server | Linux;Windows | Android;Linux;Raspberry Pi;Windows | Angular/Angular.js;ASP.NET;Django;Express;Flas... | Angular/Angular.js;ASP.NET;Django;Express;Flas... | .NET;Node.js;Pandas;Torch/PyTorch | .NET;Node.js;TensorFlow;Torch/PyTorch | Android Studio;Eclipse;IPython / Jupyter;Notep... | Windows | None | Not at all | Useful for immutable record keeping outside of... | No | Yes | Yes | YouTube | Neither | Screen Name | None | Multiple times per day | Find answers to specific questions;Get a sense... | 3-5 times per week | They were about the same | None | Yes | A few times per month or weekly | Yes | No, and I don't know what those are | Yes, somewhat | Somewhat less welcome now than last year | Tech articles written by other developers;Tech... | NaN | None | None | None | None | Yes | Too long | Difficult |
|---|
| 2 | I am a student who is learning to code | Yes | Never | OSS is, on average, of HIGHER quality than pro... | Not employed, but looking for work | India | Yes, full-time | Secondary school (e.g. American high school, G... | None | Taken an online course in programming or softw... | None | Student | 3 | 13 | None | None | None | None | None | None | I’m not actively looking, but I am open to new... | I've never had a job | None | None | Industry that I'd be working in;Languages, fra... | Something else changed (education, award, medi... | None | None | NaN | None | NaN | NaN | None | None | None | None | None | None | NaN | None | None | None | Assembly;Bash/Shell/PowerShell;C;C++;HTML/CSS;... | Assembly;Bash/Shell/PowerShell;C;C++;C#;Go;HTM... | MariaDB;MySQL;Oracle;SQLite | MariaDB;MongoDB;Microsoft SQL Server;MySQL;Ora... | Linux;Windows | Android;Google Cloud Platform;iOS;Linux;MacOS;... | None | Angular/Angular.js;ASP.NET;Django;Drupal;jQuer... | None | .NET;.NET Core;Node.js;TensorFlow;Unity 3D;Unr... | Atom;NetBeans;Notepad++;Sublime Text;Vim | Linux-based | Development | None | None | Yes | Yes | What? | YouTube | In real life (in person) | None | 2018 | Daily or almost daily | Find answers to specific questions;Learn how t... | More than 10 times per week | They were about the same | None | Yes | Less than once per month or monthly | Yes | No, I've heard of them, but I am not part of a... | Yes, somewhat | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 20.0 | Man | No | None | None | Yes | Too long | Neither easy nor difficult |
|---|
| 3 | I am a developer by profession | Yes | Once a month or more often | OSS is, on average, of LOWER quality than prop... | Employed full-time | India | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Another engineering discipline (ex. civil, ele... | Received on-the-job training in software devel... | 10,000 or more employees | Developer, back-end;DevOps specialist | 7 | 15 | 2 | Slightly satisfied | Very satisfied | Very confident | Not sure | Yes | I’m not actively looking, but I am open to new... | 1-2 years ago | Write code by hand (e.g., on a whiteboard);Int... | No | Specific department or team I'd be working on;... | I was preparing for a job search | INR | Indian rupee | 400000.0 | Yearly | 5597.0 | 7.0 | There is a schedule and/or spec (made by me or... | Meetings;Time spent commuting | Less than once per month / Never | Other place, such as a coworking space or cafe | Average | No | NaN | Yes, it's not part of our process but the deve... | The CTO, CIO, or other management purchase new... | I have little or no influence | Bash/Shell/PowerShell;C;C++;HTML/CSS;Java;Java... | HTML/CSS;JavaScript;Python | Elasticsearch;Firebase;MariaDB;MongoDB;MySQL;O... | Firebase;PostgreSQL;Redis;Other(s): | Arduino;AWS;Heroku;Linux;MacOS;Raspberry Pi;Wo... | AWS;Docker;Heroku;Kubernetes;Linux;MacOS;WordP... | Django;Express;Flask;jQuery | Express;Flask;jQuery;React.js;Vue.js | Node.js | Node.js | Notepad++;Visual Studio Code | MacOS | Testing | Not at all | Useful for immutable record keeping outside of... | Yes | Also Yes | What? | YouTube | In real life (in person) | Username | 2012 | Daily or almost daily | Find answers to specific questions;Learn how t... | 3-5 times per week | Stack Overflow was slightly faster | 11-30 minutes | Yes | Less than once per month or monthly | No, I knew that Stack Overflow had a job board... | No, and I don't know what those are | Yes, definitely | Just as welcome now as I felt last year | Tech articles written by other developers;Tech... | 23.0 | Man | No | None | South Asian | No | Too long | Easy |
|---|
| 4 | I am a developer by profession | Yes | Never | None | Employed full-time | India | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Information systems, information technology, o... | None | 20 to 99 employees | Developer, front-end;Developer, mobile | 2 | 17 | 2 | Very satisfied | Very satisfied | Very confident | No | Not sure | I’m not actively looking, but I am open to new... | Less than a year ago | Write any code;Solve a brain-teaser style puzz... | No | Languages, frameworks, and other technologies ... | My job status changed (promotion, new job, etc.) | INR | Indian rupee | NaN | Monthly | NaN | 48.0 | There's no schedule or spec; I work on what se... | None | About half the time | Office | Average | Yes, because I see value in code review | NaN | Yes, it's not part of our process but the deve... | Not sure | None | Assembly;C;C++;C#;HTML/CSS;Java | Kotlin | Firebase;MySQL;Oracle;SQLite | Firebase;SQLite | Android | Android | ASP.NET | None | None | None | Android Studio;IntelliJ | Linux-based | None | None | None | Yes | Yes | What? | WhatsApp | In real life (in person) | None | 2017 | Multiple times per day | Find answers to specific questions | More than 10 times per week | Stack Overflow was slightly faster | 11-30 minutes | Yes | A few times per week | No, I knew that Stack Overflow had a job board... | No, and I don't know what those are | Not sure | A lot more welcome now than last year | None | 21.0 | Man | No | None | None | Yes | Appropriate in length | Neither easy nor difficult |
|---|
from sqlalchemy import create_engine
import psycopg2
engine = create_engine('postgresql://dbuser:dbpass@localhost:5432/sample_db')
India_df.to_sql('samlpe_table',engine,if_exists='replace')#if_exist如果存在这张表,就更新数据
---------------------------------------------------------------------------
OperationalError Traceback (most recent call last)
D:\Anaconda\lib\site-packages\sqlalchemy\engine\base.py in _wrap_pool_connect(self, fn, connection)
2261 try:
-> 2262 return fn()
2263 except dialect.dbapi.Error as e:
D:\Anaconda\lib\site-packages\sqlalchemy\pool\base.py in connect(self)
353 if not self._use_threadlocal:
--> 354 return _ConnectionFairy._checkout(self)
355
D:\Anaconda\lib\site-packages\sqlalchemy\pool\base.py in _checkout(cls, pool, threadconns, fairy)
750 if not fairy:
--> 751 fairy = _ConnectionRecord.checkout(pool)
752
D:\Anaconda\lib\site-packages\sqlalchemy\pool\base.py in checkout(cls, pool)
482 def checkout(cls, pool):
--> 483 rec = pool._do_get()
484 try:
D:\Anaconda\lib\site-packages\sqlalchemy\pool\impl.py in _do_get(self)
137 with util.safe_reraise():
--> 138 self._dec_overflow()
139 else:
D:\Anaconda\lib\site-packages\sqlalchemy\util\langhelpers.py in __exit__(self, type_, value, traceback)
67 if not self.warn_only:
---> 68 compat.reraise(exc_type, exc_value, exc_tb)
69 else:
D:\Anaconda\lib\site-packages\sqlalchemy\util\compat.py in reraise(tp, value, tb, cause)
128 raise value.with_traceback(tb)
--> 129 raise value
130
D:\Anaconda\lib\site-packages\sqlalchemy\pool\impl.py in _do_get(self)
134 try:
--> 135 return self._create_connection()
136 except:
D:\Anaconda\lib\site-packages\sqlalchemy\pool\base.py in _create_connection(self)
298
--> 299 return _ConnectionRecord(self)
300
D:\Anaconda\lib\site-packages\sqlalchemy\pool\base.py in __init__(self, pool, connect)
427 if connect:
--> 428 self.__connect(first_connect_check=True)
429 self.finalize_callback = deque()
D:\Anaconda\lib\site-packages\sqlalchemy\pool\base.py in __connect(self, first_connect_check)
629 self.starttime = time.time()
--> 630 connection = pool._invoke_creator(self)
631 pool.logger.debug("Created new connection %r", connection)
D:\Anaconda\lib\site-packages\sqlalchemy\engine\strategies.py in connect(connection_record)
113 return connection
--> 114 return dialect.connect(*cargs, **cparams)
115
D:\Anaconda\lib\site-packages\sqlalchemy\engine\default.py in connect(self, *cargs, **cparams)
452 def connect(self, *cargs, **cparams):
--> 453 return self.dbapi.connect(*cargs, **cparams)
454
D:\Anaconda\lib\site-packages\psycopg2\__init__.py in connect(dsn, connection_factory, cursor_factory, **kwargs)
121 dsn = _ext.make_dsn(dsn, **kwargs)
--> 122 conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
123 if cursor_factory is not None:
OperationalError: connection to server at "localhost" (::1), port 5432 failed: Connection refused (0x0000274D/10061)
Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused (0x0000274D/10061)
Is the server running on that host and accepting TCP/IP connections?
The above exception was the direct cause of the following exception:
OperationalError Traceback (most recent call last)
<ipython-input-63-3ed25a579e98> in <module>
----> 1 India_df.to_sql('samlpe_table',engine)
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py in to_sql(self, name, con, schema, if_exists, index, index_label, chunksize, dtype, method)
2880 chunksize=chunksize,
2881 dtype=dtype,
-> 2882 method=method,
2883 )
2884
~\AppData\Roaming\Python\Python37\site-packages\pandas\io\sql.py in to_sql(frame, name, con, schema, if_exists, index, index_label, chunksize, dtype, method, engine, **engine_kwargs)
726 method=method,
727 engine=engine,
--> 728 **engine_kwargs,
729 )
730
~\AppData\Roaming\Python\Python37\site-packages\pandas\io\sql.py in to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype, method, engine, **engine_kwargs)
1756 index_label=index_label,
1757 schema=schema,
-> 1758 dtype=dtype,
1759 )
1760
~\AppData\Roaming\Python\Python37\site-packages\pandas\io\sql.py in prep_table(self, frame, name, if_exists, index, index_label, schema, dtype)
1648 dtype=dtype,
1649 )
-> 1650 table.create()
1651 return table
1652
~\AppData\Roaming\Python\Python37\site-packages\pandas\io\sql.py in create(self)
854
855 def create(self):
--> 856 if self.exists():
857 if self.if_exists == "fail":
858 raise ValueError(f"Table '{self.name}' already exists.")
~\AppData\Roaming\Python\Python37\site-packages\pandas\io\sql.py in exists(self)
838
839 def exists(self):
--> 840 return self.pd_sql.has_table(self.name, self.schema)
841
842 def sql_schema(self):
~\AppData\Roaming\Python\Python37\site-packages\pandas\io\sql.py in has_table(self, name, schema)
1785 else:
1786 return self.connectable.run_callable(
-> 1787 self.connectable.dialect.has_table, name, schema or self.meta.schema
1788 )
1789
D:\Anaconda\lib\site-packages\sqlalchemy\engine\base.py in run_callable(self, callable_, *args, **kwargs)
2144
2145 """
-> 2146 with self._contextual_connect() as conn:
2147 return conn.run_callable(callable_, *args, **kwargs)
2148
D:\Anaconda\lib\site-packages\sqlalchemy\engine\base.py in _contextual_connect(self, close_with_result, **kwargs)
2224 return self._connection_cls(
2225 self,
-> 2226 self._wrap_pool_connect(self.pool.connect, None),
2227 close_with_result=close_with_result,
2228 **kwargs
D:\Anaconda\lib\site-packages\sqlalchemy\engine\base.py in _wrap_pool_connect(self, fn, connection)
2264 if connection is None:
2265 Connection._handle_dbapi_exception_noconnection(
-> 2266 e, dialect, self
2267 )
2268 else:
D:\Anaconda\lib\site-packages\sqlalchemy\engine\base.py in _handle_dbapi_exception_noconnection(cls, e, dialect, engine)
1534 util.raise_from_cause(newraise, exc_info)
1535 elif should_wrap:
-> 1536 util.raise_from_cause(sqlalchemy_exception, exc_info)
1537 else:
1538 util.reraise(*exc_info)
D:\Anaconda\lib\site-packages\sqlalchemy\util\compat.py in raise_from_cause(exception, exc_info)
381 exc_type, exc_value, exc_tb = exc_info
382 cause = exc_value if exc_value is not exception else None
--> 383 reraise(type(exception), exception, tb=exc_tb, cause=cause)
384
385
D:\Anaconda\lib\site-packages\sqlalchemy\util\compat.py in reraise(tp, value, tb, cause)
126 value.__cause__ = cause
127 if value.__traceback__ is not tb:
--> 128 raise value.with_traceback(tb)
129 raise value
130
D:\Anaconda\lib\site-packages\sqlalchemy\engine\base.py in _wrap_pool_connect(self, fn, connection)
2260 dialect = self.dialect
2261 try:
-> 2262 return fn()
2263 except dialect.dbapi.Error as e:
2264 if connection is None:
D:\Anaconda\lib\site-packages\sqlalchemy\pool\base.py in connect(self)
352 """
353 if not self._use_threadlocal:
--> 354 return _ConnectionFairy._checkout(self)
355
356 try:
D:\Anaconda\lib\site-packages\sqlalchemy\pool\base.py in _checkout(cls, pool, threadconns, fairy)
749 def _checkout(cls, pool, threadconns=None, fairy=None):
750 if not fairy:
--> 751 fairy = _ConnectionRecord.checkout(pool)
752
753 fairy._pool = pool
D:\Anaconda\lib\site-packages\sqlalchemy\pool\base.py in checkout(cls, pool)
481 @classmethod
482 def checkout(cls, pool):
--> 483 rec = pool._do_get()
484 try:
485 dbapi_connection = rec.get_connection()
D:\Anaconda\lib\site-packages\sqlalchemy\pool\impl.py in _do_get(self)
136 except:
137 with util.safe_reraise():
--> 138 self._dec_overflow()
139 else:
140 return self._do_get()
D:\Anaconda\lib\site-packages\sqlalchemy\util\langhelpers.py in __exit__(self, type_, value, traceback)
66 self._exc_info = None # remove potential circular references
67 if not self.warn_only:
---> 68 compat.reraise(exc_type, exc_value, exc_tb)
69 else:
70 if not compat.py3k and self._exc_info and self._exc_info[1]:
D:\Anaconda\lib\site-packages\sqlalchemy\util\compat.py in reraise(tp, value, tb, cause)
127 if value.__traceback__ is not tb:
128 raise value.with_traceback(tb)
--> 129 raise value
130
131 def u(s):
D:\Anaconda\lib\site-packages\sqlalchemy\pool\impl.py in _do_get(self)
133 if self._inc_overflow():
134 try:
--> 135 return self._create_connection()
136 except:
137 with util.safe_reraise():
D:\Anaconda\lib\site-packages\sqlalchemy\pool\base.py in _create_connection(self)
297 """Called by subclasses to create a new ConnectionRecord."""
298
--> 299 return _ConnectionRecord(self)
300
301 def _invalidate(self, connection, exception=None, _checkin=True):
D:\Anaconda\lib\site-packages\sqlalchemy\pool\base.py in __init__(self, pool, connect)
426 self.__pool = pool
427 if connect:
--> 428 self.__connect(first_connect_check=True)
429 self.finalize_callback = deque()
430
D:\Anaconda\lib\site-packages\sqlalchemy\pool\base.py in __connect(self, first_connect_check)
628 try:
629 self.starttime = time.time()
--> 630 connection = pool._invoke_creator(self)
631 pool.logger.debug("Created new connection %r", connection)
632 self.connection = connection
D:\Anaconda\lib\site-packages\sqlalchemy\engine\strategies.py in connect(connection_record)
112 if connection is not None:
113 return connection
--> 114 return dialect.connect(*cargs, **cparams)
115
116 creator = pop_kwarg("creator", connect)
D:\Anaconda\lib\site-packages\sqlalchemy\engine\default.py in connect(self, *cargs, **cparams)
451
452 def connect(self, *cargs, **cparams):
--> 453 return self.dbapi.connect(*cargs, **cparams)
454
455 def create_connect_args(self, url):
D:\Anaconda\lib\site-packages\psycopg2\__init__.py in connect(dsn, connection_factory, cursor_factory, **kwargs)
120
121 dsn = _ext.make_dsn(dsn, **kwargs)
--> 122 conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
123 if cursor_factory is not None:
124 conn.cursor_factory = cursor_factory
OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (::1), port 5432 failed: Connection refused (0x0000274D/10061)
Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused (0x0000274D/10061)
Is the server running on that host and accepting TCP/IP connections?
(Background on this error at: http://sqlalche.me/e/e3q8)
sql_df = pd.read_sql('sqmple_table',engine,index_col='Respondent')
sql_df.head()
sql_df = pd.read_sql_query('SELECT * FROM sample_table',engine,index_col='Respondent')
posts_df = pd.read_json('https://raw.githubusercontent/CoreyMSchafer/code_snippets/master/Python/Flask_Blog/snippets/posts.json')
posts_df.head()
| title | content | user_id |
|---|
| 0 | My Updated Post | My first updated post!\r\n\r\nThis is exciting! | 1 |
|---|
| 1 | A Second Post | This is a post from a different user... | 2 |
|---|
| 2 | Top 5 Programming Lanaguages | Te melius apeirian postulant cum, labitur admo... | 1 |
|---|
| 3 | Sublime Text Tips and Tricks | Ea vix dico modus voluptatibus, mel iudico sua... | 1 |
|---|
| 4 | Best Python IDEs | Elit contentiones nam no, sea ut consul adipis... | 1 |
|---|
本文标签:
第二部分
pandas
发表评论